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AN APPROACH TO MENTAL TEST THEORY* 
Freperic M. Lorpt 


EDUCATIONAL TESTING SERVICE 


It currently seems to me that the heart of mental test theory is the 
concept of true score. 

The trouble starts as soon as it is realized that the actual test score 
obtained by a particular individual might just as well have been some 
numerical value other than the one actually observed. The individual ex- 
aminee might have guessed differently, might have been less nervous, or 
might have slept better the night before. The testing conditions might have 
been different—lighting, facilities for use of paper and pencil, presence or 
absence of distraction. Finally, any one of many different but equally appropri- 
ate psychological tests might have been constructed and administered. 

Mental test theory must deal with all these kinds of disturbing influences. 
However, we are not really interested in each of the different test scores that 
an examinee might obtain under all sorts of conditions. We are interested in 
something approximated by these scores. This something may be called the 
true score on the test. 

Many definitions of true score are possible. Each is open to serious 
objections. The hope is that when a particular definition of true score is 
used, this definition will lead to useful approximate statements about the 
true value that we are interested in, even though this is not adequately 
defined. 

Let us assume that the error of measurement, defined as the difference 
between true score and observed score, has an expected value of zero in all 
circumstances. This is the only assumption needed in order to obtain accurate 
estimates of the mean true score for a group of examinees, provided that 
the number of examinees is fairly large. I mention this in order to point out 
that some useful and meaningful results about true score can be obtained 
without making a lot of highly questionable assumptions. 

What further assumptions or definitions can profitably be used to make 
additional inferences about true scores? I plan to consider three different 
types of approaches, which I will first describe briefly. 

Perhaps the simplest and most straightforward approach is to assume 
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that one can obtain independently at least two, and preferably more, test 
scores such that the scores for any single examinee differ only because of 
errors of measurement. From a practical point of view, it is reasonable to 
make such an assumption about the carefully matched split-halves of a test, 
or even split-thirds, fourths, fifths, etc., provided the original test contains 
a sufficient number of items. 

Instead of this, a second approach goes back to the separate items of 
which the test is composed. According to one line of reasoning, a very large 
pool of items is defined in terms of the properties of the items actually at 
hand; true score is then defined in terms of the hypothetical performance 
of the individual examinee on this very large pool of items, assuming no 
practice effect, etc. A closely related model simply assumes the items at 
hand are a random sample drawn from a very large pool. 

Finally, a2 somewhat different approach is to make some restrictive 
assumptions about the frequency distribution of the errors of measurement. 
The assumption that this distribution is normal with constant variance is 
adequate for many common situations. A more plausible model assumes that 
the distribution of the errors of measurement is binomial. 

It seems to me that all three of these approaches to the problem of 
making inferences about true scores are very useful. They lead to different 
models for a theory of test scores. Five of these will be discussed in some 
detail, both practical and theoretical, skipping in part or in whole those 
results that are already well known. 

Before proceeding, it will be worthwhile to point out the distinction 
between, on the one hand, the true score of a test and, on the other hand, 
a variable variously called the common factor of the test items, “the ability 
underlying the test,” or the latent continuum. A superficial distinction 
is that the true score usually has a limited range of possible values—in 
the most usual case, from zero up to some maximum possible score such as 
the total number of items in the test; whereas the “ability underlying the 
test’’ is conveniently taken as having a possible range from — © to + -. 

A more basic distinction is that nonparallel tests of the same psycho- 
logical dimension will each necessarily have a different true-score metric 
just as each has a different observed-score metric. On the other hand, all 
tests of the same psychological dimension will have the same underlying- 
ability metric, once this has been properly defined. In this distinction lies 
the main virtue of the latent continuum—it makes possible rigorous com- 
parisons between nonparallel tests of the same psychological dimension (for 
example, comparisons to determine which of two nonparallel tests is more 
discriminating in a specified subrange of the latent continuum). There is, 
of course, a perfect curvilinear correlation between the true score on a test 
and its latent continuum; the latent continuum is simply a nonlinear trans- 
formation of the true-score scale. Thus the true scores on two nonparallel 
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tests of the same psychological dimension have a perfect curvilinear corre- 
lation—one is simply a nonlinear transformation of the other. 

The advantages of the latent continuum are achieved at the cost of the 
more restrictive assumptions required for a stronger mathematical model. 
If we do not need to compare nonparallel tests of the same psychological 
dimension (in some cases even when we do), the stronger assumptions of the 
latent continuum model can be avoided and a consideration of true scores 
will be quite adequate. 

Before proceeding with a discussion of the results obtainable from 
various mathematical models for true score, it may be pointed out that 
certain of these results may be applicable to the most varied types of problems 
in all scientific areas—typically, whenever the problem involves a large 
number of observed values, each of which contains a sizable unbiased error 
of measurement. A familiar example in the area of psychometrics is a problem 
where one has a large number of item-test correlations or item-criterion 
correlations. Since each of these is subject to a large sampling error, various 
questions arise. One would like to know the frequency distribution of the 
“true” item indices—the values that would be found if sampling errors 
could be eliminated. One would like to have an estimate of the true corre- 
lation for each individual item. One would like to know the effect of discarding 
all items whose observed correlations are below some fixed value. Given the 
sampling distribution of the individual correlation coefficient, these questions 
and related ones could be answered, at least approximately, by using certain 
of the approaches to be discussed here. 


Matched-Forms Model 


The matched-forms model for the relation between true score and 
observed score involves only two, very weak assumptions: (i) the expected 
value of the error of measurement is always zero (this statement applies 
to any individual or group of individuals selected in a fashion that is ex- 
perimentally independent of the scores containing the errors in question), 
(ii) the true score of each individual is assumed to be the same on each of 
the test forms. Given these assumptions, no further definition of true score 
is needed. The matched forms themselves provide the necessary definition 
of their true score. 

Under these assumptions, given k matched forms of a test, estimates 
can be obtained for all the moments of the frequency distribution of true 
scores up through the kth order. From three matched forms, an estimate 
of the skewness of the true-score distribution can be obtained; from four 
matched forms, an estimate of kurtosis; and so forth. 

The kind of results obtained under this model may be illustrated by 
the simple relationship in (1), which holds exactly in the population of 
examinees and approximately in large samples of examinees. 
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(1) D XX Xe = LE, 
i.e., the kth moment of the true scores is equal to the product of the examinee’s 
raw scores on k matched forms of the test, this product being averaged over 
all examinees. 

The moments of the frequency distribution of the errors of measurement 
can also be estimated, as can the multivariate moments involving both true 
score and error of measurement. From available formulas, all these estimates 
will be unbiased in random samples of examinees, even when the number 
of examinees is small, and regardless of the shape of the frequency distri- 
bution sampled. 

What can be done with these estimated moments? Methods for setting 
bounds on a distribution function from its first k moments are available 
[1, 12, 22, 28, 37]. Let me quote from M. G. Kendall ((13], p. 83): “For all © 
ordinary purposes, therefore, a knowledge of the moments, when they all 
exist, is equivalent to a knowledge of the distribution function: equivalent, 
that is, in the sense that it should be possible theoretically to exhibit all the 
properties of the distribution in terms of the moments --- 

“If now two distributions have moments up to order n equal they 
must have the same least-squares approximation [by a polynomial of degree 
n| -++ thus distributions which have a finite number of lower moments in 
common will, in a sense, be approximations one to another. We shall en- 
counter many cases where, although we cannot determine a distribution 
function explicitly, we may ascertain its moments at least up to some order; 
and hence we shall be able to approximate to the distribution by finding 
another distribution of known form which has the same lower moments. In 
practice, approximations of this kind often turn out to be remarkably good, 
even when only the first three or four moments are equated.” 

In practice, a Pearson curve having the same first four moments as 
the true scores may be used to approximate the true-score distribution. 
The Charlier and Edgeworth series are available when more than four 
moments are known. Other asymptotic approximations are summarized in 
a recent article by Wallace [34]. 

The mathematical derivations for the matched-forms true-score model 
and the necessary formulas for estimating the moments are scheduled to 
appear in [20]. A discussion of the implications of the model has appeared 
in [19]. In order to avoid undue repetition here, I will merely remark that the 
estimated moments of the errors of measurement and the estimated bivariate 
moments between true score and error for any given set of data can be used to 
make a significance test of the hypothesis that the errors of measurement are 
distributed normally and independently of true score. Certain empirical results 
from such significance tests will be discussed at an appropriate later point. 

The assumption that has been made here of strict parallelism between 
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the test forms can be relaxed if one is willing to assume that the true scores 
on all the forms are linearly related to each other. With this assumption, 
one has the prototypical problem of linear structural relationship. Selected 
references in this very relevant area are [2, 9, 10, 11, 14, 15, 21, 24, 26, 33, 
34, 35, 36]. 


Rationally Equivalent Forms Model 


The fact that the matched-forms model is conveniently applied in 
practice to split-thirds, split-fourths, or split-fifths of a test suggests that 
there should be a true-score model that bears the same relationship to the 
matched-forms model as the Kuder-Richardson reliability coefficient bears 
to the split-half method of computing reliabilities. Such a method can easily 
_ be formulated. It may be called the rationally equivalent forms model. 

For present purposes, the Kuder-Richardson approach may be thought 
of as follows. Imagine a large number of hypothetical forms of the test, 
all rigorously parallel to each other and to the actual test at hand. (By 
“rigorously parallel” is meant that in any sufficiently large preselected 
group of examinees, the group statistics computed from the test scores 
will be the same, no matter which form of the test is used.) The crux of the 
Kuder-Richardson approach is to estimate the correlation between any two 
parallel forms of the test from item statistics computed for the single form 
actually available. In order to achieve this, one further assumption is needed, 
which, it seems to me, cannot be classified as part of the assumption of 
rational equivalence, as some writers have done. The algebraic derivation 
turns up with a term representing the covariance between item 7 in the test 
actually at hand and the corresponding item in the hypothetical test. Some 
assumption is required. The usual assumption is, of course, that the average 
value of this unknown covariance is the same as the average interitem 
covariance for all the items in the test at hand. 

With this assumption, the correlation between rationally equivalent 
forms of the test can now be estimated. Since this correlation is a reliability 
coefficient, it can be used to estimate the variance of true scores and the 
variance of the errors of measurement. It is implicit in all this that true 
score on the test at hand is to be defined as the average score that would be 
obtained on a very large number of rationally equivalent forms. It is only 
a short step from this line of reasoning to point out that the item statistics 
on the one test actually at hand can be used not only to estimate the variance 
but also the higher moments of the true-score distribution. 

Let X,, be the score of examinee a on test u. Suppose that this score is 
the sum of the examinee’s scores on n items, so that 


(2) : ae — ps Xuia ? 


i=1 
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where 2,;. is the score of examinee a on the ith item in test wu. The desired 
results are obtained by first substituting (2) into (1). For example, if there 
are three parallel tests, then 


n n n 


(3) D~ez= > > >, (Dy ZrceBancB sie) 
a g=1 =1 i=1 a 
Application of the line of reasoning just outlined now transforms (3) into 


; Dea 5 aT aOe 
(4) N digs a n(n —_— 1)(n = 2) Ze -¥ = N > Xigavihatiia - 


g#h4i 





It is thus seen that under the assumptions made the third moment of the 
true scores for sufficiently large samples of examinees can be computed from 
the available item statistics on a single form of the test. Similar formulas may 
be written for the higher true-score moments. Furthermore, quantities 
like those on the right of (4) can be expressed as a linear function of the 
moments of the observed-score distribution, to be denoted by m’. Thus 
(4) can be written 





1 Ries , , am /2 
(5) N par ~ nn — In — 2) (mz; — 3m; + 2m;). 

Under this mathematical model for true scores, given a sufficiently 
large sample of examinees, there is no place for the usual process of statistical 
estimation. Once the assumptions are granted, the population of test items 
is defined exactly by the item statistics in the tests actually at hand. I am 
not completely happy with this model. If tests are to be rationally equivalent, 
it seems clear that corresponding items in different tests will usually be more 
alike than would two different items chosen at random. This is certainly not 
enough reason for discarding the model, however. 

The model to be discussed next starts from a somewhat different set of 
premises, which, however, lead to formulas not quite identical to those 
illustrated by (4). 


The Item-Sampling Model 


This is the same model that was previously called the matrix-sampling 
model [16]. In this model, it is assumed that the items in the test at hand 
can be considered as a random sample of items from a very large pool or 
population of items. The true score of an individual examinee is here con- 
veniently defined as a proportion—as the probability that an item chosen at 
random from the pool of items will be one that he will answer successfully. 
More or less standard techniques of statistical inference permit the estimation 
of the examinee’s true score, thus defined, from his responses to the items 
at hand. Under this model, given that the number of examinees, N, is suf- 
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ficiently large so that we are willing to neglect quantities of order 1/N, 
the observable quantity on the right-hand side of equation (4) is no longer 
equal to the third true-score moment but is an unbiased estimate of it in 
random samples of test items. The accuracy of this estimate depends on n, 
the number of items in the test at hand. 

There is another important difference between the results obtained under 
the item-sampling model and those obtained under the rationally equivalent 
forms model. Under the rationally equivalent forms model, if we wish to 
know the numerical value of a product or power of true-score moments, it is 
only necessary to multiply together the numerical values computed for the 
moments in question. Under the item-sampling model, this is not sufficient. 
A product of unbiased estimates of moments is not the same as an unbiased 
estimate of the product of moments. Special formulas must be derived to 
provide unbiased estimates of the various products called for by different 
formulas. The necessary formulas are available [16, 17] so that unbiased 
estimates can be computed for any true-score moment about the origin or 
about the mean up through the fourth order. 

Also, formulas are presented for the bivariate moments between true 
score and observed score. This makes possible the estimation of the regression 
of true score on observed score, even when this is curvilinear. By this method, 
therefore, one can, at least in theory, predict the true score of a given examinee 
from his observed score without assuming a linear regression. 

A useful result is obtained when linear regression is assumed. The formula 
for the regression coefficient in this case turns out to be very similar to the 
Kuder-Richardson formula-21 reliability coefficient. In fact, it differs from 
this coefficient only by quantities of order n~’. 

Now, a little algebra makes it apparent that this regression coefficient 
should in fact also be a reliability coefficient. In fact, there seems to be 
more intuitive meaning if one defines reliability as the regression of true 
score on observed score rather than as a ratio of two variances. I was at first 
somewhat puzzled, therefore, to find discrepancies between the regression 
coefficient obtained under this model and the Kuder-Richardson formula-21 
reliability. The answer to this puzzle has been independently provided by 
Nageswari Rajaratnam, working with Lee Cronbach and, more recently, 
but probably not completely independently, by myself. Rajaratnam defines 
reliability in terms of the ratio of two variances and then derives a small- 
sample formula for estimating reliability in the type of situation where the 
item-sampling model holds. When the number of examinees, N, is large, as 
explicitly assumed in my formula, but not in hers, the two formulas are the 
same. 

Equation (6) gives one form of the large-sample formula for this new 
reliability coefficient, which is also the slope of the regression of true score 
on observed score, 
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~ = 2 
. 1 _ pg- 8: 
(6) seats (n — ls +8,’ 





where f is the average item difficulty, g is 1 — , s? is the variance of pro- 
portion-correct observed scores, and s°, is the variance of item difficulty. 

The relationship of this reliability coefficient to K-R (21) is not very 
apparent when the formula is presented in this form. The difference between 
the two coefficients can be shown to be 


7 , .a1-—). Afi 
(7) pea = (n — Ds? = o(4). 


As long as the test has more than zero reliability, the difference between 
the two coefficients is actually a quantity of order n~’, since the quantity 
1 — f in the numerator is itself a quantity of order n7’. 

When the number of examinees is large, the sampling fluctuations in # 
or 2; , due to sampling of items, are of the order of n~*”. Thus the difference 
between # and r2, is a half order of magnitude smaller than the sampling 
fluctuations in either. The difference between them will be negligible in typical 
testing situations. This comparison differs from that between the K-R (20) 
and K-R (21) coefficients, since their difference is of order n~’ and is thus 
larger than the sampling error of either. From the point of view of the item- 
sampling model, the new coefficient # (or better, the small sample version 
derived by Rajaratnam) seems preferable to r., from a logical point of view 
even though their difference is usually small compared to their sampling 
fluctuations. 

In working with various mathematical models for true scores, it is often 
disturbingly difficult to find ways of checking theoretical deductions from 
the models against observable experimental results. It is therefore of interest 
to bring forward two practical applications of the item-sampling model. In 
these as yet unpublished studies, the true-score model is used to predict the 
moments of observed-score distributions. These latter distributions are 
subsequently obtained and the actual moments compared with the predicted 
values. 

In the first study, the problem is to predict the effect of lengthening 
a test upon the shape of the frequency distribution of observed scores. For 
example, if we have a skewed distribution of observed scores, will doubling 
the length of the test tend to reduce the skewness or to increase it? It turns 
out that either result may occur, depending upon the nature of the test, 
and also upon the nature of the group of examinees tested. 

On the one hand, one can imagine a group of examinees whose proportion- 
correct true scores are symmetrically distributed over a very small range, 
say, from .85 to .95. As the test consists of items that are rather easy for 
this group, the observed scores will tend to have a negatively skewed dis- 
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tribution. If the test is made longer and longer, however, the observed- 
score distribution will approach the symmetry of the true-score distribution. 
In this case, therefore, lengthening the test decreases the skewness of the 
observed-score distribution. 

As an example of the opposite situation, consider a case where the group 
of examinees tested has a somewhat skewed true-score distribution in the 
narrow range between .45 and .55. If the test is short and unreliable and is 
composed of 50-percent difficulty items, the symmetrical distributions of 
the errors of measurement will almost completely swamp the skewness of 
the true-score distribution, so that the observed-score distribution on a 
short test will appear nearly symmetrical with observed proportion-correct 
scores, possibly ranging from zero to one. In this case, lengthening the test 
will increase the skewness of the observed-score distribution. 

What is needed is a formula for determining from the available item 
statistics and score statistics on a single form of the test which of these 
things will happen—whether the skewness will decrease or increase as the 
test is lengthened. Similar questions, of course, may be asked for the kurtosis 
of the distribution. 

How is it that true-score theory can be of use here? The reasoning is 
simply as follows. It is given that the longer test is parallel to the shorter 
test except for length, i.e., the proportion-correct true score for any examinee 
will be the same on either test. Formulas are available for unbiased estimates 
of true-score moments in terms of item statistics and score statistics on the 
short form of the test. Such formulas are also available in terms of statistics 
on the long form of the test. We have, in principle, two unbiased estimates 
of each moment of the true-score distribution, one from each form. If these 
two unbiased estimates are equated, a set of formulas expressing statistics 
on the longer test in terms of statistics on the shorter test is obtained. For 
example, if the long-form statistics are distinguished by a prime, (4) leads to 


1 n’(n’—1) (n' —2) 


(8) n'(n’ ig 1)(n’ — 2) > 





Ugh alia 


n(n—1) (n—2) 


1 
~ n(n — 1)(n — 2) ap p> eer 





Since a moment of the longer test is a linear function of quantities like those 
on the left of (8), the moments on the longer test can be estimated from the 
item statistics on the shorter [17]. As already mentioned, algebraic manipu- 
lation makes the computations involved in equations like (8) feasible. This 
procedure has been applied empirically; satisfactory agreement between 
predicted and actual values was found for the data studied. 

In the second study, the problem was to predict the bivariate moments 
of the scatterplot between two parallel forms of a test from the item statistics 
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of either form alone. Formulas for this have been developed [17] by a line of 
reasoning exactly parallel to that just described. These bivariate moments 
determine the shape of the scatterplot and, in particular, the shape of the 
regression of one form on the other (this regression need not be strictly linear 
even though the forms are parallel*). This procedure was also applied em- 
pirically and satisfactory agreement was found between predicted and 


actual values. 
Gaussian Errors of Measurement 


A somewhat different perspective emerges if one is willing to make more 
specific assumptions about the frequency distribution of the errors of measure- 
ment. It is frequently an adequate approximation to assume that the errors 
of measurement are normally distributed with zero mean and constant, 
experimentally determinable variance. For convenience, this true-score 
model will be referred to as the Gaussian error model. Some extremely 
interesting and useful results can be deduced from this model, many of 
which I am sure have never been applied in mental test theory. 

First of all, as noted in ({16], pp. 2-3), when this model holds, all the 
cumulants of the true-score distribution except the second are exactly equal 
to the cumulants of the observed-score distribution. Thus, if the variance 
of the errors of measurement can be estimated, then any moment of the 
true-score distribution can be expressed as a polynomial in the moments of 
the observed-score distribution. 

In my recent work I have been searching for methods of dealing with 
true scores without computing moments. The reason for doing this is that 
the use of moments has certain practical disadvantages, which I should 
mention briefly. In the first place, third- and fourth-order moments are 
subject to large sampling fluctuations, and estimated third- and fourth-order 
true-score moments even more so. Even with 2,000 examinees, the standard 
error of the estimated fourth-order moment is uncomfortably large. Higher 
order moments will seldom be useful. If only four moments of a distribution 
can be used, there is no reason to believe that this is a particularly efficient 
method of approximating the distribution. The use of Pearson curves is 
probably often effective in practice, but their effectiveness cannot be 
guaranteed. It is well known that the Charlier series and the Edgeworth 
series may give poor fits to skewed distributions when only four moments 
are used. These difficulties motivate a search for methods of making inferences 
about true scores without computing the moments of the observed-score 
distribution. 

Before proceeding, it is appropriate to ask whether every frequency 
distribution of observed scores is compatible with the Gaussian error model. 


*Ferguson has shown that, with improbable exceptions, this regression will not be 
linear unless the true score is normally distributed in the group tested [7]. 
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It must be admitted first of all that this model can never be strictly appropri- 
ate for describing an observed frequency distribution of number-correct or 
proportion-correct scores, simply because such scores represent a discrete 
and bounded variable whereas the scores produced under the Gaussian 
error model are necessarily continuous and unbounded (except, of course, 
in the degenerate case when the variance of the errors is zero; this case will 
be ruled out of further consideration). Avoiding a strict interpretation 
of the Gaussian error model, suppose we replace the histogram of the number- 
correct scores by a smooth frequency curve that approximates it, running 
from minus infinity to plus infinity. Will the model always be compatible 
with such a smooth frequency curve? It is intuitively clear that given some 
fixed value for the variance of the errors of measurement, there must be 
some limit to the sharpness of curvature in the frequency distribution of 
observed scores. The presence of errors of measurement necessarily obscures 
in the observed-score distribution any sharp detail that might exist in the 
true-score distribution. There is therefore a limit to the sharpness of detail 
that can appear in the observed-score distribution under this model. Necessary 
and sufficient conditions for the frequency distribution of observed scores 
to be compatible with the Gaussian error model are given by Pollard [25] 
and by Standish [29]. In the present context, their conclusions are probably 
primarily of theoretical rather than practical interest. 
The Gaussian error model is completely represented by (9): 


(9 (a) = [ @N(w — 8) ae, 


where f(x) is the frequency distribution of observed scores, g(#) is the unknown 
frequency distribution of true scores, and N(x — &) is the normal distribution 
for the variable e = x — & having zero mean and known standard deviation, 
c. It is helpful to visualize this equation as referring to a scatterplot between 
true score on the abscissa and observed score on the ordinate. The quantity 
under the integral sign is the product of the frequency distribution of true 
scores and the conditional frequency distribution of observed score for a 
given true score. This product is equal to the bivariate distribution between 
true score and observed score; for fixed values of x and é, the quantity under 
the integral sign may be thought of as representing the frequency in any 
cell of the scatterplot. The integral sign represents a set of summations. The 
equation says simply that the frequency of occurrence of a particular score 
x is equal to the quantity obtained by summing all the cell frequencies in 
the corresponding row of the scatterplot. 

Equation (9) is an integral equation. The quantity f(x) is known as the 
convolution of g(€) and N(é). The mathematical problem is typically to 
estimate g(#) when f(z) is known. More specifically, (9) is a Fredholm equa- 
tion of the first kind. At the moment, it seems to me that methods for solving 
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such integral equations ([3], ch. 14; [4]; [31], sec. 3.15; and [5]) will be an 
important aid in making inferences about true scores. Trumpler and Weaver 
((32], chs. 1.4 and 1.5) give at least five different methods for solving equations 
such as (9). In addition, they go on to more advanced problems that are of 
great interest here, such as “Correction of a Bivariate Distribution for 
Observational Errors” and “Statistical Determination of the Functional 
Relationship between Variables.” 

Almost all methods for solving (9) start off by replacing the continuous 
variable ~ by a discontinuous variable. For example, suppose that the true- 
score distribution g(#) can be approximated by a discontinuous distribution 
denoted by G(é). Equation (9) becomes 


(10) f(2m) wr > GE) N (am ad £,), 


where z,, is any specified value of the observed score, and £, is one of a set 
of specified values for the true score. For fixed values of z and é, the function 
N(tm — &,) is a tabled value of the normal curve ordinate. Equation (10) 
is a linear equation in the unknown frequencies, the G’s. As long as these 
unknown frequencies are no more numerous than the known frequencies, 
the f’s, (10) can be solved for the G’s, giving an approximate representation 
of the frequency distribution of true scores. 

I am anxious to point out the essential simplicity of (10). If the number 
of equations is equal to the number of unknowns, then the solution merely 
involves computing the inverse of the matrix whose elements are the values 
of N(x — &). Moreover, once it has been computed, this inverse can be used 
over and over again for all sorts of different tests containing the same number 
of items. Each unknown frequency in the true-score distribution is thus 
expressed as a weighted average of the frequencies in the observed-score 
distribution. 

While emphasizing a certain simplicity about the problem, I should also 
mention that there are certain serious complications, caused by the fact 
that the observed-score distribution will never fit (9) exactly, if only because 
of sampling fluctuations. This may pose serious obstacles in the way of 
making valid inferences about the true-score distribution. The procedure 
just outlined is likely to give too good a fit when applied to actual data, 
with the result that some of the G’s may turn out to be negative, or that 
the frequency distribution represented by the G’s may have a very irregular 
shape. An important current problem is finding procedures that will deal 
with these types of difficulty. 

In a recent paper, Gaffey [8] considers from the point of view of statistical 
inference a method devised by Sir Arthur Eddington, the astronomer, to 
solve equation (9). Gaffey obtains formulas for statistical estimators of 
the G’s and also for the sampling variance of these estimators. 
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I think you will be interested in seeing how simple Gaffey’s method is. 
It will be convenient to describe the method in a practical application, 
although the method itself is perfectly general. The data used were the 
observed scores of 388,071 students on a 50-item test. These data were chosen 
because of the large number of cases and because the score distribution looked 
approximately normal. Only .0002 of the cases made perfect or zero scores. 
The procedure used was to group the frequency distribution into 26 class 
intervals and then to compute the first, second, third, and fourth differences 
of the column of frequencies. Under the Gaussian-error model, any true-score 
frequency can then be estimated by starting with the corresponding observed- 
score frequency, subtracting from it one-fourth of the corresponding second 
difference, and adding one thirty-second of the corresponding fourth difference. 

I have gone into detail on this method in order to indicate what a very 
simple procedure may be found for estimating a true-score distribution 
directly in terms of the frequencies of observed scores. Unfortunately, the 
results obtained by the application of Gaffey’s method to the data described 
show that these data do not meet the assumptions of the method. The 
estimated true-score frequencies formed a bell-shaped distribution and 
appeared at first to be satisfactory, but the variance of the estimated distri- 
bution was found to be much too large in view of the known variance of the 
errors of measurement in the data. Dr. Gaffey was kind enough to correspond 
with me on this problem. It appears likely that the discrete and bounded 
character of a frequency distribution of number-correct scores cannot be 
reconciled with the assumptions underlying the Gaussian error model within 
a sufficient tolerance to allow the application of Gaffey’s methods. It is to 
be hoped that the binomial model, to be discussed next, will avoid some of 
these difficulties. 

Before leaving the Gaussian error model, let me describe briefly a 
beautiful result obtained by Eddington and summarized by Trumpler and 
Weaver (([32], pp. 128-131). Eddington derives the regression of true score on 
observed score in the form 


2 f(x) 
f(x) ’ 


where the term on the left is the mean true score for a given observed score, 
o” is the variance of the errors of measurement, and {’(z) is the first derivative 
of the frequency distribution of the observed scores. This equation is based 
on no approximations and involves no assumptions other than the basic 
assumption of the Gaussian error model. 

How effective (11) will be for estimating true scores from number- 
correct observed scores remains to be determined. Numerical methods 
are certainly available for approximating the first derivative from the first 
and higher differences of the observed-score distribution. In any case, results 


(11) M,.2=X%+0 
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such as (11) give encouragement to efforts to find simple, direct methods for 
estimating true score from observed score without assuming a linear regression. 

Although it is clear that the Gaussian error model cannot fit number- 
correct score data exactly, it is of interest that data have been found for 
which the Gaussian-error model is rejected by appropriate tests of statistical 
significance. As already mentioned, the necessary significance tests were 
derived from the very general matched-forms model, which I discussed first 
of all. The significance tests were applied to the vocabulary scores in four 
widely different groups of one thousand examinees each. The Gaussian-error 
hypothesis was rejected for each of the four groups at the .01 significance 
level. The results will be reported in [18]. These results do not mean that 
the Gaussian-error model should be rejected from mental test theory, but 
they do justify an attempt to find a more appropriate model for estimating 
true scores. 


The Binomial Error Model 


In the Gaussian error model, the observed scores have a normal distri- 
bution for any fixed value of true score. In trying to improve on this model, 
it is natural to consider the effect of assuming the observed scores to have 
a binomial, instead of normal, distribution when true score is fixed. This 
assumption is equivalent to the integral equation 


1 
(12) fe) = | o(S)B(t, 2) dt, @ = 0,1, +++ ,m), 
where ¢ is the proportion-correct true score 


(13) f= ét/n, 


and where B(f¢, x) is the binomial distribution 


(14) Br, 2) = (")ea - 9. 


This model is intended to be appropriate for dealing with number-correct 
scores, not for scores obtained by formula scoring or other methods. What- 
ever its faults, there seems to be no question but that the binomial error 
model is a better approximation than is the Gaussian error model. It is also 
in many ways a fairly simple model to handle mathematically. 

For one thing, it is not difficult to express the moments of the true- 
score distribution in terms of the moments of the observed-score distribution: 

— x)! 
(15) w(t) = =D mia), = 0,1, «++ yn). 
The left side of (15) represents the rth moment of the true-score distribution. 
The quantity m/{,,(x) designates the rth factorial moment of the observed- 
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score distribution. A factorial moment (({13], pp. 56-60) is simply a linear 
function of the regular moments, m? ; for example, 


(16) Mia(X) = my — 6mz + 11m, — 6m; . 


It turns out that (15) is the same as the set of equations illustrated 
by (5), representing the rationally equivalent forms model. It is thus seen 
that the binomial error model is completely equivalent to the rationally 
equivalent forms model, insofar as the latter is concerned with predicting 
the shape of the frequency distribution of true scores from that of observed 
scores. The two models are not interchangeable, however, for purposes of 
inferring such things as the scatterplot between true and observed scores. 

For some time now, I have been trying out various numerical methods 
for using equation (12) to infer the frequency distribution of true scores from 
a given distribution of observed scores. One set of data may be of interest 
here. They are the scores of more than 2000 professional people on a 30-item 
figure matrix test. An interesting question arises because a full one-sixth 
of the examinees have scored at the chance level or below—a full 10 percent 
score below chance. Only 7 percent of the examinees get as many as half 
the items right after correction for guessing. The presence of so much guessing 
in the score distribution must tend to obscure the shape of the true distri- 
bution of competence in the group tested; so it will be of interest to examine 
the distribution of true scores. A still more interesting question is whether 
or not the scores below the chance level simply represent random deviations 
from a true score at the chance level, or whether some of the examinees 
actually have true scores that are below the chance level. I already have 
enough data clearly to answer the second question. There is no doubt but 
that a substantial number of examinees have true scores below the chance 
level. This result could occur in one of two ways, both of which seem to 
deserve further investigation. On the one hand, it may be that some examinees 
do not understand the directions to the test and systematically proceed in 
the wrong fashion. The second, and I think more likely -possibility, is that 
many of the distractors for the test items are so ingeniously contrived as to 
be more attractive to some people than are the correct answers. 

In order to apply a binomial error model to data of this kind where 
there are many omitted items, it is necessary for the statistician to supply 
a response for each item that each examinee has omitted. In the case of 
five-choice items, the result is the same as would have been obtained if the 
examinee had tossed a five-sided die to determine which response to make 
on the omitted items. While certain questions can be raised as to the legitimacy 
of this procedure, it seems that it is not likely to give rise to true scores below 
the chance level unless these were already present in the data. 

Whenever the frequency distribution of true scores has been inferred 
from a given set of data by means of an integral equation such as (9) or (12), 
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it is always possible to use the integral equation to reconstruct the bivariate 
frequency distribution of true scores and observed scores, since this bivariate 
frequency distribution is simply the integrand on the right side of the equa- 
tion. Given the bivariate frequency distribution between observed score 
and true score for any given set of data, it is a simple matter to obtain by the 
numerical methods the mean true score corresponding to each different 
observed score—in other words, the regression of true score on observed 
score. 

I have been making a search for more direct methods of estimating this 
regression—for a method similar, perhaps, to Eddington’s method in the 
Gaussian error case. There is an important theoretical difference between 
the Gaussian case and the binomial case in this connection. Under the 
Gaussian model, if a given f(x) is compatible with the model, it is likely 
to determine completely the frequency distribution of true scores. Under 
the binomial model, the observed data consist of n + 1 frequencies. These 
determine at best only » moments of the true-score distribution, by means 
of (15). The higher moments of the true-score distribution can, within narrow 
limits, be assigned arbitrarily without impairing the ability of the true-score 
distribution to reproduce the observed data. All of these true-score distri- 
butions with identical lower order moments will look very much alike since 
they all have the same approximating polynomial of degree n; however, 
the indeterminacy gives rise to certain mathematical problems, as will be 
seen in a moment. 

A simple recurrence relationship, apparently as yet unpublished, can 
be derived from (12): 





e+ 1#OF DG yy, 


(17) ‘<2 


In words, the mean true score for any given observed score, x, can be deduced 
from the frequency of occurrence of the score, the frequency of occurrence 
of the next larger score, x + 1, and the mean true score at x + 1. Thus, if 
the mean true score could be inferred for any value of x between 0 and n, 
the recurrence relationship would give the mean true score at all other 
values of x. 

As a result of the indeterminacy of the true-score distribution, however, 
there is no precise way of getting started. We have n independent linear 
equations, but we have n + 1 unknown values of M;., to determine. There 
seem to be some rather good practical expedients for dealing with this in- 
determinacy, however. For one thing, it is wholly reasonable to require that 
the relationship between x and the estimated value of M,., be monotonic 
increasing, at least in that part of the score range where the most data are 
available—that is, near the mode of the score distribution. This requirement 
immediately sets rather narrow limits to the values that can be assumed by 
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M,., . Figure 1 shows two different curves obtained for M;., by choosing 
the values of M;.,, first as large as possible and then as small as possible 
consistent with monotonicity in the region x = 10, 11, 12. The figure suggests 
that for these data the average true score can be determined within a fairly 
narrow range for any given observed score. 


Ms. 
107 








% i0 20 25 x 
Figure 1 
Two estimates of the regression of true score, ¢, on observed score, z (N = 4,000) 


It should be explained that Figure 1 is derived from the observed-score 
distribution of 4,000 examinees on a 25-item vocabulary test. The observed- 
score distribution was smoothed before carrying out the computation, using 
a five-point formula given by Cureton [6]. Otherwise even with 4,000 ex- 
aminees, the regression of true score on observed score would have been 
excessively irregular, due to chance fluctuations in the adjacent frequencies 
of the observed-score distribution. 

The jagged line in Figure 2 is an estimate of the regression of true score 
on observed score for a 50-item quantitative test. In this case, there were 
388,071 examinees. The observed-score distribution was not smoothed in 
obtaining Figure 2. The figure is presented to show the excellent agreement 
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FIGURE 2 
Regression of true score on observed score estimated from K-R (21) (straight line) and 
from binomial error model (NV = 388,071) 


between the results obtained from (17) and the linear regression of true 
score on observed score computed from the K-R (21) reliability coefficient, 
represented by the straight line in the figure. It turns out that under the 
binomial error model, the linear regression coefficient for predicting true 
from observed score, given a large number of examinees, is the K-R (21) 
coefficient, not the K-R (20) coefficient or the new coefficient given in (5), 
above. 

In the case of Figure 2, there appears to be little need for using (17) 
since the regression is very close to linear and can be well represented by 
the classical equation. This result might have been guessed in advance from 
the fact that the observed-score distribution is very nearly normal. 

The device used to obtain the jagged regression line in Figure 2 was to 
try various values of M;., for values of x near the mode and to choose the 
value that made the regression line most nearly straight in the middle. 
This would seem to be quite a justifiable way of using (17). A more elaborate 
but quite feasible procedure would be to determine the values of M;., so 
as to minimize the jaggedness of the regression line in some least squares 
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sense. For other methods related to this problem, see Robbins [27] and 
Steinhaus [30]. 

Much more remains to be done in working out convenient numerical 
methods for dealing with the relatively simple problems already discussed. 
There are also certain more advanced problems, for which test theory should 
be able to provide an answer. It is hoped that the present methods may lead 
to solutions for some of the following, for example. 

1. Given the observed-score distribution for a single group of examinees 
on each of two tests that are known to measure the same psychological 
dimension, but that are not otherwise parallel, estimate the possibly curvi- 
linear functional relation between their true scores, and deduce from this 
the scatterplot of their observed scores. 

2. Given, further, the observed-score distribution of one of these two tests 
in a second and somewhat different group of examinees, estimate the observed- 
score distribution of the other test in the second group of examinees. Also 
estimate the scatterplot for the second group of examinees. 

3. Devise a method for determining whether two tests do or do not 
measure the same psychological dimension, i.e., a method for determining 
whether the true scores on two tests have a perfect curvilinear relationship 
or not. 

In closing, I would say that all five of the true-score models outlined 
yield good results when applied appropriately. All five will probably con- 
tinue to be of active interest. Empirical studies comparing the results obtained 
under different models would help to clarify their various advantages and 
disadvantages. 
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A theoretical approach to the understanding of human behavior in un- 
certain outcome situations is suggested, an approach which draws upon 
utility theory, decision-making theory, and statistical association theory. 
Experimental evidence supporting this approach as opposed to alternative 
approaches is summarized. Three different formalizations are presented, and 
a variety of experimental tests is suggested. 


The purposes of this paper are: 

I. to discuss a controversy which has arisen between two theoretical 
approaches pertaining to the behavior of human beings in a situation 
involving choices under uncertainty; 

II. to suggest a different approach, set in a decision-making and utility 
theory framework, to account for the choice behavior to which both the 
other theories have reference; 

III. to present three mathematical models—each involving concepts 
from decision-making and utility theory—which make predictions about 
human behavior in the choice situation under discussion; 

IV. to present and discuss some experimental evidence which supports 
the general features of these models. 


The Controversy Between Two 
Theoretical Approaches to Choice Behavior 


The predictions which people make when placed in a two-choice un- 
certain outcome situation have received considerable attention in recent 
years [2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 17, 18, 20, 21]. In the classical ex- 
perimental situation, as first used by Humphreys [14], the subject is asked 
to predict whether a light will or will not appear after a signal stimulus. 
Following the signal, either of two mutually exclusive alternative events 


*This paper was written while the author was at the Center for Advanced Study in 
the Behavioral Sciences, and it has benefited ra from discussions with many colleagues 
there. In particular, some of the features of Models I and II emerged from work with 
Robert P. Abelson, and Model III was covelapes to its present state partly as a result of 
consultations with Claude Shannon, and John C. Harsanyi (of the Department of Economics 
at Stanford University). The treatment of all three models was sharpened during many 
discussions with Robert M. Solow. 
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can occur: the light can appear or the bulb can remain unlit. The two events 
occur with fixed but unequal probabilities, say 7, and 7, , in a random sequence 
for a number of trials. In recent years, two lights have often been used, one 
of which appears with probability + and the other with probability 1 — x 
on each trial. In some cases, events other than lights have been used. In 
any case, the events are independent of the subject’s behavior—regardless 
of what he does, one of the two will occur on each trial. The subject is in- 
structed to do his best to predict correctly which of the two events will 
occur. He is given no other verbal information about the situation, but he 
is allowed to witness the events and thus to determine for himself the cor- 
rectness or incorrectness of each of his predictions. 

There are two theoretical models of interest here which yield predictions 
about an individual’s behavior in a situation of this sort. The first is Estes’ 
statistical learning model [5], which yields the prediction that subjects will 
learn to match their response ratios (the relative frequencies with which 
they predict each of the two events) to the actual probabilities of occurrence 
of the events, s and 1 — z. The same prediction is given by the Bush-Mos- 
teller model [2] when certain restrictions are applied to the parameters of 
that model. A number of studies report findings in support of this pre- 
diction [8, 12, 13, 15]. 

The second model, formalized by von Neumann and Morgenstern [22], 
is a game-theoretic model. According to some interpreters, one prediction 
consistent with this model is that a person will learn to maximize the ex- 
pected frequency of correct predictions. To do this, he will predict the more 
frequent event on all trials. 

To contrast the predictions from the two models, consider, for example, 
a two-choice uncertain outcome situation in which over a series of trials the 
two events, E, and EF, , occur with probabilities x and 1 — 7 of .75 and .25, 
respectively. Estes’ model asserts that a person will learn to make predic- 
tions which will tend toward stabilizing at prediction of E, on 75 percent 
of the trials (p = .75) and prediction of E, on 25 percent of the trials. A 
person who adopts this “strategy’’ will have .625 as his expected proportion 
of correct predictions, for 


E 
+z 


I] 


pr +> (1 tate pl sat T), 
.75(.75) + .25(.25) = .625. 


I 


On the other hand, if the subject were to adopt the “pure strategy” of pre- 
dicting FE, on every trial (p = 1.0), he would have .75 as his expected pro- 
portion of correct predictions, for 


1.0(.75) + 0(.25) = .75. 


One straightforward prediction from the game-theoretic model, then, would 
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seem to be that a person would behave in the second way. Evidence giving 
some support to this prediction may be found in [10, 11, 17, 18, 21]. 

Because of the considerations sketched above, some behavioral scientists 
who have been influenced by game-theoretic principles have asserted that 
people who match their response ratios to the probabilities of the events 
are acting irrationally, in that they are failing to maximize their expected 
proportion of correct predictions, a goal they could accomplish by predicting 
the more frequent event on every trial. The empirical fact is that in this 
situation most people do, after many trials, stabilize at matching their re- 
sponse ratios to the probabilities of Z, and FE, . To assert that this is irra- 
tional is to rely on a highly restrictive meaning of that term. It was pointed 
out by Bernoulli [1] in the first half of the eighteenth century that any theory 
of rational behavior which does not incorporate the concept of the utility 
or subjective value of the outcomes rather than their objective value will lead 
to paradoxes of the kind under discussion. As Simon has reminded us ([20], 
p. 271), one must bear in mind the distinction between objective rationality 
(rationality as viewed by the experimenter) and subjective rationality (be- 
havior that is rational, given the perceptual and evaluational premises of 
the subject). The purpose of the following section is to suggest a resolution 
of the apparent paradox which has been described, a_resolution which in- 
corporates the utility theory approach. 


A Decision-Making Approach 


From.-decision-making theory [3, 4], a hypothesis of maximization of 
expected | ulilitymay be drawn, w] which will account : for both sorts of prediction 
behaviors—matching response *fatios and maximizing the expected frequency 
of correct predictions. According to this approach, whether a person will 
tend toward one or the other prediction strategies depends on certain _con- 








ditions related_to the reinforcement inherent inthe situation. Where utility 
is understood to refer to the subjective value of an outcome, the @enera 


§ is that_a person will behave as if he were attempting to maximize 

edcuitility in any instance, The componen nts of the total utility vary 
in magnitude from situation to situation and thus.a person’s strategy to 
maximize expected utility varies. _ 


is reasonable to suppose that when a person is in a situation in which 
the only payoff attached to the outcomes is the satisfaction of having his 
prediction confirmed by the event or the dissatisfaction of having his pre- 
diction disconfirmed by it, making a correct prediction of the rarer event 
has greater utility for the person than making a correct prediction of the 
more frequent event. The person derives satisfaction from playing a game 
with the machine, trying to outwit it. Moreover, there is the matter of mo- 
notony, both kinesthetic and cognitive: predicting the more frequent event 
on all trials would engender the monotony of pressing the same button (a 
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common method of stating predictions) on trial after trial for hundreds of 
trials, in addition to the monotony of the same cognitive response (e.g., left 
light) on trial after trial. Under such circumstances, a subject may maxi- 
mize expected utility by matching his response ratio to the-actual prob- 
abilities of occurrence of the two events. For him, the cost of an incorrect 
prediction is very low, whereas the gain from this strategy in terms of other 
utilities, such as the utility of gambling and the utility of variability, may 
be relatively high. By choosing a mixed strategy (i. e., splitting his predic- 
tions in some proportion), such a subject may maximize hi§ own total satis- 
faction. If such an account is correct, then_a decision-making model and 
the Estes model could yield the same predictions concerning the stable state 
(asymptotic) behavior of a person in a_two-choice no payoff situation. 
However, if the utility attached to correct predictions is increased by 
a change in the conditions of the game, i.e., if there is potential satisfaction 
beyond that of having an event confirm one’s prediction, or if the cost at- 
tached to an incorrect prediction is increased, then the prediction yielded 
by the hypothesis of maximization of expected utility diverges from the 
prediction yielded by the Estes model. That is, if the decision-making theory 
approach proposed here is correct, the introduction of systematic variation 
[in the reinforcements (and thus utilities) attached to correct or incorrect 
predictions should be followed by systematic differences in subjects’ responses. 
As the utility of a correct prediction is increased and/or the cost (negative 
utility) of an incorrect prediction is increased, the person’s prediction of 
the more frequent event should tend to 100 percent, i.e., he may be expected 
| to choose a pure strategy rather than a mixed one. 
To make precise predictions about behavior, it is necessary to have 





, more precise formulations of the decision-making theory approach than the 


informal comments above. The next section presents three such formula- 
tions. Each of these mathematical models is set in a decision-making frame- 


_work and is based on considerations from utility theory. The models have 


emerged from an attempt to construct a unified theory to account for the 
stable-state choice behavior to which both the Estes model and the game- 
theoretic model have reference. 


Alternative Utility Models of Choice Behavior 


Of the three models which are to be presented, Model I is the simplest. 
Central to it are two concepts: the utility of a correct prediction,\and the 
utility of variabilitythe utility of varying one’s responses, or relieving 
monotony. In this model, there is no explicit consideration of the utility of 
gambling. Thus, there is an implicit assumption, and this may be a reasonable 
one, that when data on decision making consist of averages for groups of 
subjects, there is little, if any, systematic effect from the utility of gambling, 
since the utility of gambling is positive for some people and negative for 
others. 




















SIDNEY SIEGEL 307 


Model II is richer than Model I in that it includes the concepts of the 
first model—although slightly different mathematical properties are ascribed 
to them—and it also deals with the utility of gambling. 

Model III contrasts with the first two in that it incorporates the Shannon 
information measure, and also in that it seems to lead to somewhat different 


predictions than are yielded by either Models I or II. 


Model I 


Let x 
Pp 





the probability of occurrence of the more frequent event, 
the proportion of times the subject chooses the more frequent 
event, 
a = the marginal utility of a correct prediction, and 
b = the marginal utility of varying one’s responses. 
If the expectation that a subject’s prediction will be correct, E, , is 


E, = [px + (1 — pl — »)] 

(1 — x) + p(2e — 1)), 

then the expected utility of a correct prediction, U, , is 
U, = aE, = al(1 — x) + p(2x — 1)]. 


If U, = the utility of varying one’s responses = f(p), such that U, is sym- 
metrical and maximal for p = .5, e.g., p(l — p), (which, though arbitrarily 
chosen, has considerable intuitive appeal), then 


Il 


U, re bp(1 a p). 
The total expected utility of a particular strategy p is 
U(p) = U, + U, 


= af[(l — 2) + p(2r — 1)] + bp(l — p). 


The strategy p which maximizes expected utility is at 


dU(p) oe 
dp 
and is 
os || | ee 
oe 6 TS 


If a = a/b then 
(1) p = a(m — 3) + 3. 


It may be seen from (1) that when and only when a = 0 and thus a = 
1, p = m. That is, when the marginal utility of a correct prediction equals 
the marginal utility of varying one’s responses, Model I makes the same 
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prediction regarding asymptotes (stable state behavior) as the Estes model. 
This situation seems to hold when the only reward in the situation is know- 
ing whether one’s prediction is correct or not. However, if the reinforce- 
ments inherent in the situation are increased, say by adding monetary re- 
wards and/or costs to the outcomes, and thus the utility of a correct predic- 
tion and/or the negative utility of an incorrect prediction increases, then 
a > banda > 1. For such conditions, the predictions regarding asymptotes 
-yielded by Model I diverge from the predictions yielded by the Estes model. 

The predictions Model I yields concerning a subject’s choice and strategy 
behavior may be generally stated as follows. For any z, 


1 
2r — 1’ 





(i) p= 1 when a> 


1 


(ii) 1 > p> + when 1 <a < =——-, 
2x — 1 


(iii) p = z when a = 1, 
(iv) p < a when a <1. 


The third prediction, that p = 7, is the prediction yielded by the Estes 
model, and is a special case of Model I. The first, second, and fourth predic- 
tions are ones which it would seem that the Estes model is not_prepared to 
make. 

In Model I, & may be estimated from data from (1), 


g - 2 OP), 
a — (1/2) 


In the final section of this paper, experimental results are reported 
which give some support to Model I. It will be seen that when Ss are run 
in the Estes situation under three different reinforcement conditions, the 
values of & which may be estimated from the data are: 


No Payoff condition, & = 1.00; 
Reward condition, & = 1.44; 
Risk condition, & = 1.80. 


A relatively simple test of Model I is being conducted by the author at 
present, in which the above values of & are utilized under different values of 
a. Preliminary results strongly support Model I. 

A more crucial test of Model I would entail the experimental manipu- 
lation of a by variation of b, the marginal utility of varying one’s responses. 
In the Estes No Payoff condition, with an increase in b, one would predict 
p < m. With a decrease in b, one would predict p > r. S. W. Becker, ina 
personal communication, has described data he has collected at Stanford 
University confirming these predictions. 
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Model II 


Let 7 = the probability of occurrence of the more frequent event, as in 
Model I, 
p = the proportion of times the subject chooses the more frequent 
event; as in Model I, 
the marginal utility of a correct prediction when and only 
when the subject chooses the more frequent event, 
b = the marginal utility of a correct prediction when and only 
when the subject chooses the less frequent event, and 
c = the marginal utility of varying one’s responses. 


The expected utility of a correct prediction, Z,(U,), is 
E{U,) = apr + W(1 — p)(l — 2), 


and the utility of varying one’s responses, U, , is f(p), as in Model I, and is 


a 


U, = cp(l — p). 
The total expected utility of a particular strategy p is 
U(p) = apr + b(1 — p)(1 — x) + cp(l — p). 
The strategy p which maximizes U(p) is at 


dU(p) _ 
dp em 
and is 


_ es — ee 
. 2c 





If a = a/c and if 8B = b/c then 





It may be seen from (2) that when a = b = c, and thus a = 6 = 1, Model 
II yields, as a special case, the Estes prediction that p = 7. It will be shown 
later that the case when a = 8 = 1 is not the only case for which p = 7 
according to Model II. 

Model II is richer than Model I in several respects. One is that it makes 
possible experimental studies involving differential payoffs for the more 
frequent and less frequent events. In addition Model II may lead to the 
measurement of an individual’s specific utility of gambling. The predictions 


be generally stated as follows. For any 7m: 
(i) p = 1 when 


tte cet 


=< i 


4 MOF LW 


at 





Model II yields concerning a subject’s choice and strategy behavior may | 
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«> ooze t (A;) 
and/or 
—1 
$7; @B,) 


(ii) 1 > p > w when 
GQ=-~wWe+1l,, G-ms + Orb) 














T ry 
and/or 
eae eR. 
1 T l-—fr 
(iii) p = 7 when 
T 
and/or 
_ wa — (25 — 1). 
B ee 1 aia ’ (B,) 


(iv) p < = when 





ao < Lomb + (2 — 1) 
Tv 
and/or 


ma — (2m — 1) 
l-—@r 





B> 


i 


( The first, second, and fourth predictions—that p will be 1, between 1 and 


a, or less than r—are predictions that the Estes model is not prepared to 
make. The third prediction—that p will be at s—is the Estes prediction, 
and occurs here as a special case (or set) in Model II. It is interesting to 
note that the second set of predictions above is obtained by proving the 
inequalities (A,) > (A,) and (B,) < (B,). 

In Model II, there are two parameters, a and 8, which must be esti- 
mated before the value of p can be predicted: 


_ (1 = 8) + x(a +8). 





(2) p 3 


To test Model II, two successive studies would be needed, the first to obtain 
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estimates of the parameters. Subjects could be run under two 7’s and one 
level of reinforcement, R, . Data would be required, that is, from a study 
of the following design. 


T; Te 
R, Lp |» | 


With z, and x, set, and with p, and p, observed, according to Model II, 
(1 Sak B) + (a +- B) 
2 ? 





pri 
(1 — 6) + x(a + 8) 
2 





1 le 


Solving these two equations simultaneously, estimates of 8 and a are as 
follows: 


8 am 2(pite ranit Dot) os 1, 


=" By 
g=-%m—m) _¢ 
T, — We ; 


With 8 and & computed, a second study could then be conducted using a + 
different from those above, say x; . Under the same reinforcement, R, , as 
was used above, a test of Model II would be a test of the prediction 


(1 — 8) + 7(@ + 8) 
2 





r= 


An important difference between Model I and Model II is that Model 
II contains separate expressions for the utility of correctly predicting the more 
frequent event and the utility of correctly predicting the less frequent event. 
Thus, whereas both models predict that p will depend on the reinforcement 
hinging on correct predictions, only Model II yields a prediction for the 
situation in which the reinforcement_contingent on correctly predicting. the 
more frequent e1 event differs from the reinforcement contingent on correctly 
predicting the less frequent event. According to this model, by systematically 
varying these two types of reinforcement it should be possible to induce 
variations in p from zero to one! 

With such an experimental set-up it should be possible to obtain a 
measure of a subject’s specific utility of gambling. To obtain such a measure, 
it would be necessary to know the subject’s utility of money, and this coul 
be determined by the method developed by Davidson, Siegel, and Suppes [3]. 
The experimenter could then determine how large a change in money is 
necessary to change the subject’s strategy, p, from some stable value to 
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unity. This result, under the proper experimental conditions, could yield a 
measure, in monetary units, of the subject’s specific utility of gambling. 
Model III 


The third model makes a potentially interesting connection between two 
theoretical approaches: utility theory and information theory. 
Let x = the probability of occurrence of the more frequent event, as in 
Models I and II, 
p = the proportion of times the subject chooses the more frequent 
event, as in Models I and II, 
a = the marginal utility of a correct prediction when there is a pay- 
off (say monetary) in addition to the satisfaction of knowing 
a prediction has been confirmed, and 
b = the marginal utility of “reflecting” or “reproducing,” in some 
manner (stochastic), the information in the event system. 
The expectation of a correct prediction, EF, , is 
E, = [pr + (1 — p)(l — x)] = [1 — 2) + pe — DD]. 
Then the expected utility of a correct prediction with payoffs over and above 
the satisfaction of knowing a prediction has been confirmed, E,(U,,), is 
EAU,,) = a{(l — x) + p(2x — 1). 


The logarithm of the expectation that the subject will reflect the infor- 
mation, or stochastic structure, of the event system, for n trials, is 


log (” pra —p)"*". 

Utilizing Stirling’s formula and expanding, the logarithm of the expectation 
for any particular trial (i.e.,n = 1) is 

[x log p + (1 — x) log (1 — p)] — [ log a + (1 — x) log (1 — »)]. 
Thus the expected utility involved is 
U, = b{[x log p + (1 — x) log (1 — p)] — [m log a + (1 — =) log (1 — )]}, 
and the total expected utility of a particular strategy p is 
U(p) = Uy + U; 
a[(l — ) + p(2m — 1)] + bfx log p + (1 — =) log (1 — p)] 

— b[x log + (1 — x) log (1 — m)]. 


The strategy p which maximizes expected utility can be found by setting 


dU(p) 


dp 


, 
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that is, 

a(2r — 1)p’ + [b — a(2x — 1)]p — br = O. 
By dividing by 6 and letting@ = a/b, , 
(3a) a(2r — 1)p’ + [1 — a(2r — lp — 37 = 0. 


From (3a) it may be seen that under the conditions of the conventional 
statistical learning experiment, conditions in which there is no payoff other 
than the satisfaction of knowing the correctness or incorrectness of a pre- 
diction, a = 0 and thus p = x. That is, the Estes prediction is yielded for 
such conditions. However, if reinforcement is increased, then, setting \ = 
2x — 1, 








p = =a) + V(1_ = ad)? + 4adr 


(3b) 2aXr 


For three commonly used values of z the predictions that Model III 
yields concerning a subject’s choice and strategy behavior are shown in 
Figure 1. It is interesting to notice that, in contrast to Models I and II, 
Model III does not predict that p = 1 for any finite value of a; rather, in 
Model III, p approaches 1 asymptotically as a increases. 


Relevant Experimental Evidence 


Experimental data are available to support the general approach to the 
prediction of asymptotes shared by these models as opposed to the general 
features of statistical association theory. That is, there are data supporting 
the notion that an increase in the reinforcement contingent upon a correct 
prediction, and thus the utility of a correct prediction, should result in a 
subject’s p being higher than z, and that under high reinforcement p should 
approach unity. These data are presented below. They are of interest here 
for the additional reason that the value of a can be estimated from them and 
this estimate may be used in future studies to predict p (from Model I). 

A study was conducted by Siegel and Goldstein [19] in which the ex- 
perimental situation already described was used: and in which subjects were 
observed under three conditions of reinforcement: No Payoff, Reward, and 
Risk. Two lights were illuminated according to a random series, with one 
illuminating 75 percent of the trials (t = .75) and the other 25 percent of 
the trials (1 — + = .25). The subjects were 36 male students. They were 
randomly assigned to three sets of equal size, the assignment determining 
the condition under which the subject would be run. The three conditions 
were as follows. 

No Payoff. Under the No Payoff condition, the reinforcement for each 
prediction consisted simply of seeing the outcome—seeing whether the right 
or left bulb lit, and thereby determining whether one’s prediction was con- 
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Figure 1 
Predicted values of p from (3b) with a = 0 to 10 for three levels of x. 


firmed or disconfirmed. The No Payoff condition has been the “classical’’ 
situation for studies of human behavior in two-choice situations. 

Reward. Under the Reward condition, the reinforcement for each pre- 
diction consisted of seeing the outcome and receiving five cents for each 
correct prediction. The reward was given at the conclusion of each trial in 
which S’s prediction was confirmed. 

Risk. Under the Risk condition, the reinforcement for each prediction 
consisted of seeing the outcome, receiving five cents if the prediction was 
confirmed, and losing five cents if it was disconfirmed. At the conclusion of 
every trial, S either received or forfeited five cents, depending on whether 
his prediction had been correct or incorrect. 

Every subject, regardless of condition, was given 75 cents at the start 
of the study and told that whatever money he held at the conclusion would 
be his to keep. 

The purpose of the experiment was to test a hypothesis drawn from the 
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decision-making approach introduced earlier, a hypothesis which reflects the 
general approach of the three models which have been presented. The hy- 
pothesis was that the asymptotic probability of a person’s predicting the oc- 
currence of the more frequent event in a two-choice uncertain outcome situation 
is a function of the level of reinforcement present in the situation, such that the 
probability of predicting the more frequent event will tend toward unity as the 
rewards (positive utility) and costs (negative utility) of correct and incorrect 
predictions are increased. 

That is, using Estes’ notation [cf. 6], where ,(~) is the mean asymp- 
totic probability of predicting FE, for a group of like individuals, the hypoth- 
esis was that 


p:(~) under Risk > j,() under Reward > j,() under No Payoff. 


The data confirmed the prediction at p < .001 [16]. Ss under the Risk 
condition predicted the more frequent light oftener than Ss under the Reward 
condition, and these in turn predicted the more frequent light oftener than 
Ss under the No Payoff condition. Under the No Payoff condition, the mean 
proportion of times that the more frequent light was predicted during the 
final 20 trials of the first 100-trial series was .70. Under Reward, the mean 
was .77, and under Risk it was .93. 

For those 12 Ss (four randomly selected from each group) who were 
maintained under the same reinforcement conditions for 200 additional 
trials, the mean proportion of times that the more frequent light was pre- 
dicted during the final 20 trials of the final 100-trial series was .75 under the 
No Payoff condition, .86 under the Reward condition, and .95 under the 
Risk condition. There was no overlap among the scores of these Ss under 
the three conditions at the end of 300 trials. 
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To explain changes in the sociometric configuration of a group through 
time, a problem arises of the extent to which such changes may be viewed as 
the aggregation of part-processes occurring at the level of two-person choice 
structures. A possible model is a Markov chain in which three possible states 
are mutual choice, one-way choice, and indifference, one realization for each 
pair of choosing individuals in the group. Choice data for an eighth-grade 
classroom are fitted to this model and are used to answer questions of con- 
stancy of transition probabilities, order of the chain, and sex differences. 


While the full complexity of the notion of the configuration of inter- 
personal choices as representative of a social group has been frequently 
discussed in the literature, most authors (including the present ones) have 
quickly abandoned attempts to deal with the process as a whole and have 
instead concentrated upon particular facets of the process. Although practi- 
cally everyone recognizes that the process is one which evolves and modifies 
in time, almost all past investigations have been couched in terms of single 
points in time. The literature is replete with examples of studies leading to 
the description of the configuration in a static way; there exist only a very 
few examples of attempts to deal with as many as two observations of the 
configuration, separated in time. 

In this paper a view of the configuration as a time-dependent process 
is specified and certain simplified stochastic models are developed, retaining 
all of the essential features of the process. Finally, an actual example con- 
sisting of a series of observations on the configuration for a single group is 
considered, and tests are applied to see whether the models describe the 
process reasonably well. 


Configuration of Interpersonal Relationships as a Time Process 


The standard sociometric definition represents a group in terms of the 
aggregate of all of the interpersonal relations between pairs of individuals 
constituting the group. Such an aggregate of relations is called the conjigu- 
ration for the group; we recognize that such a description is valid only at a 
specific point in time. In this sense the configuration of the group is a func- 
tion of time, and changes with time. As time proceeds the configuration 
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changes kaleidoscopically with some of the interpersonal relations remaining 
unchanged, while others change abruptly from one quality of the relation- 
ship to another. 

A particular relation between a designated pair of individuals, at any 
point in time, is considered to be in one of a discrete set of states representing 
the various possible qualities of this interpersonal relationship. As time 
proceeds, this state continues in force until some subsequent point at which it 
changes abruptly to some other state, continuing in the second state until 
the next change, which may be a return to the previous state or a passage 
to a new state. 

Alternatively, one may wish to look at the entire configuration con- 
sisting of the aggregation of all of the pairwise relations, and consider transi- 
tions among the possible states of the configuration. However, if there are 
N individuals in the group and each relation may be in any of k states, the 


N 
number of conceivable states of the configuration is x 2) a fantastically 
large number. With N = 10 and k = 3, this number is 3 X 10”. 


The Two-Person Relation 


Although the pressures on the two-person relationship are continuous 
from within and without, so changes in the quality of the relation may take 
place at any point in time, administrative necessity indicates that the group 
will only be observed at discrete points in time. With this restriction, the 
only recognized changes will be those in the quality of the relation which 
may have occurred at any point following the preceding time of observation. 
Missed completely will be those cases where the relation changes in two 
steps, perhaps even returning to the previous state. This may not be at all 
serious since these states are of very brief duration. In what follows these 
transitory states which fall through the sieve of our observation are ignored. 

The possible states in which the relation may be found constitute a 
double J-nary system in which the two orientations of each asymmetric 
relation may or may not be identified. The simplest cases consist of double 
binary relations in which each individual is “on” or “off”? with respect to 
his partner in the pair. It may be desirable to differentiate between the rela- 
tions “on-off” and ‘‘off-on” as between a specified pair of individuals. The 
differentiation is difficult to interpret, unless there is a complete ordering 
among the individuals in the group in terms of status, income, etc. In these 
instances it is essential to know whether a one-way relationship goes from the 
lower to the higher, or the reverse. If, for the moment, it is assumed that 
this identification is not desired, then only three states are possible for the 
two-person relation, the state of mutual positive choice, the state of one-way 
choice, and the state of non-choice in both directions. For convenience these 
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are designated by the numbers 2, 1, 0, respectively, corresponding to the 
number of positive choices expressed. 

Although it seems almost preposterous to hold the position strongly, 
logic demands initial consideration of a simple model in which the relations 
at different periods of time are independent. If this is the case then the prob- 
abilities of the relations 0, 1, and 2 are independent of the actual relations 
observed in earlier periods. When sociometric measurements at two points 
in time are available for the same group it is possible to test the hypothesis 
of independence by examining the 3 X 3 array of numbers of pairs falling 
into the various combinations of relation before by relation afterward. If 
the numbers observed are sufficiently large, a simple chi-square test with 
four degrees of freedom is appropriate. Almost certainly, this test will yield 
rejection of the hypothesis of independence as a result of persistence of 
large entries in the principal diagonal of the array. 

If the process is not random and independent in time, one might next 
investigate the nature of the dependence of the state of the process at one 
point in time upon states at other points in time. Although it is possible 
that a case might be made for a more general mode of dependence, experience 
in other investigations would indicate that the next most likely model to be 
tried is some sort of Markov process, with the nature of dependence on the 
past so far unspecified. A rather complete exposition of the theory of dis- 
crete Markov processes may be found in Feller ([2], ch. XV and XVI). At 
this point it is possible to propose a variety of different tests which vary with 
the specific nature of the Markov process under consideration. It is not 
possible to outline all of the tests which might be made; instead, a particu- 
lar collection of data will be examined and how one may determine the de- 
pendence of the process on the past will be indicated by example. 


Empirical Examination of the Markov Properties 


Data consist of the choices made by an.eighth-grade class at four time 
points in the school year reported by Taba as part of a project in inter- 
group education [3].* Twenty-five students, 16 girls and 9 boys, were asked, 
“With whom would you like to sit?”’ This sociometric test was administered 
in September and November and in January and May of the following year. 
All pupils made three choices each except in November when two gave only 
two choices each. A complete description of the research and an able analysis 
of the static group structure is available ([3], pp. 45-75). 

Frequencies of the various transitions in the 300 two-person relation- 
ships between pairs of time points, which appear in Table 1, allow one to 
test whether there is independence in the sense of uniformity of distribution 


*The data were used in this study with the kind permission of Dr. Taba. 














Distributions of Types of Transitions in Two-Person Choice Structures, 


TABLE | 


PSYCHOMETRIKA 






Test Population of 25 Eighth Graders 











































Time Periods 
Type of Transition 
Sept. to Nov. to Sept. to Jan. to Nov. to Sept. to 

Before After Nov. Jan. Jan. May May May 
mutual 7 7 6 6 5 5 
Mutual to l-way 5 0 3 7 5 3 
indiff. 3 6 6 l 3 7 
mutual 5 7 6 6 3 3 
same I-way 17 18 14 15 15 9 

I-way to opposite I- ’ 
way 2 3 | ! 5 4 
indiff. 21 13 24 25 23 29 
mutual ! 0 2 4 8 8 
Indifference to l-way 22 26 29 20 18 27 
indiff. 217 215 209 215 215 205 
Sums 300 300 300 300 300 300 

























between states of these relations at different times. Tables such as the follow- 
ing were examined (in each case, with 4 degrees of freedom). 




















State of State in November 
relation in 
September 2 1 0 Sums 
Mutual 2 ‘f 5 3 15 
l-way 1 5 19 21 45 x? = 121.82 
Indifference 0 1 22 217 240 
Sums 13 46 241 300 
State of State in May 
relation in 
November 2 1 0 Sums 
Mutual 2 5 5 3 13 
l-way 1 —_— 23 46 x2 = 82.34 
Indifference 0 8 18 215 241 
Sums 16 43 241 300 








Without doubt, the later states of two-person relationships are affected by 
previous states and it remains to be seen whether the dependence is Markovian. 

One asks first if the transition probabilities are constant from one pair 
of time points to the next. For example, do the data contradict the state- 
ment that starting from a mutual structure the probability of remaining 
mutual is the same for the transition September to November as for the 
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November to January transition? To make the time gap of the transition 
equal, the transition from September to November is compared with that 
from November to January and the September to January transition is 
compared with that from January to May. The test consists of computing 
a chi-square value, as in a test of association, for the various 2 X 3 and 
2 X 4 tables ({1], p. 98). An example is the following. 








Transition 
Starting from a Sept. Nov. 
Mutual and to to 
ending at... Nov. Jan. Sums 
Mutual cf 7 14 
l-way 5 0 5 x = 5.87 
Indifference 3 6 9 
Sums 15 13 28 








The larger the chi-square value, the more untenable becomes the assump- 
tion of constant transition probabilities. 

For the September to November and November to January comparison 
the chi-square values are: starting from mutual, 5.87; from one-way, .80; 
from indifference, 1.36. The comparison of the September to January with 
the January to May transition yields: starting from mutual, 5.13; from one- 
way, .00; from indifference, 2.40. Although there is a suggestion that the 
probabilities of reaching states from a mutual relation differ from one time 
to another, the chi-square values do not attain significance at the .05 level. 
Thus, we will proceed as though they were constant through time. 


TABLE 2 


Second, Order Transitions in Two-Person Choice Structures 
from September-November to January 














Structure in Structure in January 
September November Mutual l-way Indifference Sums 
mutual 4 tt) 3 7 
Mutual: l-way 2 3 0 5 
indifference 0 0 3 3 
mutual 2 0 3 5 
l-way : I-way 4 9 6 19 
indifference 0 6 15 21 
mutual ! 0 0 ! 
Indifference: l-way 1 9 12 22 
indifference 0 
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We now ask if the Markov chain is of order one or order two. For 
this test the distinction between same and different one-way was ignored so 
that there would be only nine transitions between the three states before and 
after, a more conventional chain. The data used to make the test are found 
in Tables 2 and 3. The present test likewise consists of making various chi- 


TABLE 3 


Second Order Transitions in Two-Person Choice Structures 
from September-January to May 




















Structure in Structure in May 

September January Mutual l-way Indifference Sums 
mutual 4 1 1 6 

Mutual: l-way 1 1 1 3 
indifference 0 1 5 6 

mutual 2 4 0 6 

l-way : l-way 0 5 10 15 
indifference 1 4 19 24 

mutual 0 2 0 2 

Indifference: l-way S 10 4% 29 
indifference 3 1S 191 209 

Sums 16 43 241 300 





square tests of association—in this case for 3 X 3 tables. For example, from 
Table 2 consider the following tabulation. 











Structure in Structure in January 
Sept. Nov. Mutual l-way Indifference Sums 
Mutual Mutual 4 0 3 7 
l-way Mutual 2 0 3 5 
Indifference Mutual 1 0 0 1 
Sums 7 0 6 13 





x? = 1.26 (2 degrees of freedom because of 0’s) 


A large value of chi-square would indicate that a difference in the state 
of the structures in September has influence over the movement to 
new structures in January. That is, larger chi-square values indicate a second 
or higher order chain. The observed chi-square values are summarized in 
Table 4. 

Obviously the chain of four-month gaps, September to January to May, 
does not show second-order dependence. However, for the chain of two- 
month gaps, the state of the structure in September appears to exert in- 
fluence over the way in which the indifferences of November changed into 




















LEO KATZ AND CHARLES H. PROCTOR 


TABLE 4 


Values of Chi-Square for Tests of Order of Chain 

















September-November to September-January to 
January Transition May Transition 
Structure at Intermediate Degrees of Degrees of 

Time Point Chi-Square Freedom Chi-Square Freedom 
Mutual 1.26 2 5.78 4 
l-way 8.02 4 4.26 4 
Indifference 7.82% 2 4.27 4 
Sums 17.25* 8 14.31 12 





*Significant at 5% level. The value 7.82 when corrected for continuity becomes 6.83, which is still 
significant at the 5% level. This correction consists of computing the next smallest value of chi- 
squared, in this case 5.83, for the same set of marginal totals and using as ‘'corrected'' value the 
mid-point of the two, or 6.83. 


other states in January. The following tabulation shows the direction of 
this influence (numbers in brackets are theoretical or expected frequencies). 














Structure in Structure in January 
September November Mutual l-way Indifference Sums 
Mutual Indifference 0 0 [0.32] 3 [2.67] 3 
l-way Indifference 0 6 [2.26] 15 [18.73] 21 
Indifference Indifference 0 20 [23.41] 197 [193.59] 217 
Sums 0 26 215 241 


Apparently, the passage through indifference does not eliminate the 
tendency to return to the previous state. A similar but less marked tendency 
can be noted in the chain of four-month gaps. Possibly the investigation of 
more closely spaced time points would reveal a considerable number of 
returns to a previous state even upon passage through a mutual or a one- 
way structure. 

Having decided that the chain of four-month gaps is of order one and 
that the transition probabilities are stable, the data in Table 1 are used to 
estimate these probabilities. From a mutual structure the probabilities of 
becoming in four months a mutual, a one-way, or an indifference are .414, 
.345, and .241, respectively; from a one-way to a mutual, same one-way, 
different one-way, or indifference are .130, .315, .022, and .533, while from 
an indifference to a mutual, one-way, or indifference are .013, .102, and .885, 
respectively. The matrix of transition probabilities is constructed by com- 
bining the ‘“‘to same” and “to different one-way” transitions. 
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To 
From 2 1 0 
Mutual 2 .414 .345 .241 
l-way 1 .130 .337 .533 
Indifference 0 .013 .102 .885 


Since all states are accessible from each state, the Markov chain is 
irreducible and possesses a unique stationary distribution with limiting prob- 
abilities for mutual, one-way, and indifference of .051, .149, and .800, re- 
spectively. Among the 300 two-person structures in a 25-person group, one 
would expect to find as “stable” values 15 mutuals, 45 one-ways, and 240 
indifferences. It is thus not surprising that the numbers of mutuals, one-ways, 
and indifferences were observed to be so stable. The numbers of mutuals 
were 15 in September, 13 in November, 14 in January, and 16 in May; the 
one-ways were 45, 46, 47, and 43; while the indifferences were 240, 241, 239, 
and 241. What was perhaps surprising was the fact that the “initial” or 
September distribution (15-45-240) conformed exactly to the limiting values. 


Girl and Boy Subgroups 
Taba [3] notes a sex cleavage and consequently the data on transitions 
were examined separately for boys and girls. The numbers of transitions 
TABLE 5 


Distributions of Types of Transitions in Two-Person Choice Structures, 
Boy to Boy, Gir! to Girl and Cross Sex by Five Pairs of Time Points 

















Type of Sept. to Nov. Nov. to Jan. Sept. to Jan. Jan. to May Sept. to May 
t Gt x B to G to x B to G to x B to G to x B to G to x 

Transition 8 to ad "aes B CG Sex B G Sex 8 G Sex 8 G Sex 
Mutual to... 

Mutual 2 5 0 3 A 0 1 5 0 2 4 0 2 3 0 

l-way 3 2 it) 0 3 i) 0 1 ! 2 1 0 

indi ff. 1 2 0 3 3 0 2 4 0 0 | 0 2 5 0 
l-way to... 

Mutual os 1 0 i) 6 | 2 3 1 1 3 2 2 1 0 

Same 2 15 0 8 4 1 2 10 2 3 7 5 2 6 | 

Opposite ’ ’ 6 1 2 0 0 0 1 0 1 0 1 3 0 

indi ff. 3 " 7 4 13 1 6 15 3 6 W 8 5 18 6 
Indiff. to... 

Mutual i) 1 (1) tt) 0 0 tt) 2 tt) 2 1 | 1 4 3 

l-wey 7 12 3 1 VW 14 5 12 12 5 WW 4 4 14 9 

indiff. 13 70 134 16 72 127 15 69 125 16 76 123 1S 65 125 

Sums 36 120 ae 36 120 Vay 36 120 44 36 120 144 36 120 inal 
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were counted for the 36 pair-relations among the 9 boys and for the 120 
relations among the 25 girls. These appear in Table 5. Due to the fact that 
more than 85 percent of the 144 cross-sex pairs were consistently indiffer- 
ences, the choice behavior of the whole class can almost be considered as the 
sum of two disjoint subgroups. 

Tests for constancy of transition probabilities from a two-month or 
four-month gap to the next and for second- versus first-order dependency 
were performed for the girl-to-girl choices. No departure from constancy 
nor from a first-order chain were detected; however, the number of cases is 
small. Tables 6 and 7, which parallel Tables 2 and 3 above, are given for 



































completeness. 
TABLE 6 
Second Order Transitions in Girls’ Two-Person Choice Structures 
from September-November to January 
Structure in Structure in January 
September November ~ Mutual l-way Indifference Sums 
mutual 3 0 2 5 
Mutual: l-way 2 0 0 2 
indifference 0 0 2 2 
mutual 0 0 1 1 
Mutual: l-way 3 7 6 16 
indifference i) 3 8 W 
mutual 1 0 0 1 
Mutual: l-way i 4 7 12 
ina indifference 0 8 62 70 
a Sums 10 22 87 120 
x 
Sex 
TABLE 7 
Second Order Transitions in Girls' Two-Person Choice Structures 
0 from September-January to May 
0 
0 Structure in Structure in May 
September January Mutual l-way Indi f ference Sums 
0 
| mutual 3 1 | 5 
0 Mutual: l-way 0 0 0 0 
indifference 0 0 4 4 
6 
mutual ! 2 0 3 
l-way : l-way 0 4 6 10 
indi f ference 0 3 12 15 
3 
9 mutual 0 2 0 2 
Indi f ference: l-way 3 4 5 12 
indifference H 8 
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Of interest is the similarity of the boys and girls with respect to the 
relative frequencies of the transitions. To show this, tables such as the fol- 
lowing were examined for the combined (September to November with 
November to January) two-month transitions. 








Starting from a Mutual Sub-group 
and ending two months 
later at... Boys Girls Sums 
Mutual 5 9 14 
l-way 3 2 5 x? = .90 
Indifference 4 5 9 
Sums 12 16 28 








For the two-month gap the chi-squares are: starting from mutual, .90; 
starting from one-way, 1.21; starting from indifference, 1.60. For the four- 
month gap they were: starting from mutual, .95; starting from one-way, 
1.04; and starting from indifference, 4.06. These values show a strong degree 
of similarity between the two subgroups. Although the number of relations 
within the boys’ subgroup is only 30 percent of that within the girls’ group 
while the boys made 56 percent as many choices as did the girls’ subgroup, 
the boys initiated over half of the intersex choices, thus equalizing the ratios 
of number of in-group choices to number of possible in-group relations for 
the two subgroups. Over and above this point of similarity, the relative 
amounts of exploratory and persisting choice behaviors are similar for the 
two subgroups. 


Summary 


After proposing that interpersonal choice behavior viewed through time 
may conform to a Markov chain model, some data from an eighth-grade class 
choosing on a seating criterion were examined. Within the limits of the 
numbers of cases available no evidence was found to suggest that the transi- 
tion probabilities varied with time. For the two-month gaps there may 
possibly be second-order dependence of the chain, but the chain of four- 
month gaps did not depart significantly from first-order dependence. Inter- 
estingly, the initial or September numbers of mutuals, one-ways, and indif- 
ferences conformed exactly to the limiting distribution for a first-order chain 
of four-month gaps. 

The girls’ and boys’ subgroups behaved similarly with respect to the 
amounts of persisting or repetitive choosing and of exploratory choosing. No 
departures from the first-order Markov chain model achieved statistical 
significance among the within-sex choices. 
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PATTERN VARIANTS ON A SQUARE FIELD 


S. J. PRoKkHovNIK* 


THE UNIVERSITY OF NEW SOUTH WALES 


A quantitative approach to the psychology of pattern recognition re- 
quires knowledge of the number of possible variants of any particular pattern. 
The general solution for the number [p/m?] of pattern variants that p counters 
can oon on a square network of m? positions is obtained by elementary 
group theory. The exact solution is given in terms of the different types 
(symmetric, asymmetric, etc.) of patterns possible and an approximate 
formula for the total number of patterns is also developed. 


Given a number p of identical counters and a square network of m’? 
positions to place them on, it is possible to form a finite number of configura- 


2 
tions given by ¢ ) However, if only different patterns are allowed, the 


number is reduced considerably, since certain sets of configurations differ 
from one another only by rotation and/or reflection of the field; each such 
set represents a single pattern. The number of elements in the set will vary 
according to the type of symmetry or asymmetry of the particular pattern. 
Solution for the number of pattern variants requires differentiation among 
four types of configurations as well as a number of subtypes. 

For simple cases, particular solutions can be obtained, though labori- 
ously, by means of elementary arithmetic and geometry. Application of 
group theory enables a general solution applicable in all cases. The solution 
is extended with comparative ease to the case where the pattern consists 
not of identical counters but of two or more different types. This requires 
only the application of a permutation factor which will generally differ for 
each of the four types of configurations. The problem has found (in fact 
arose from) practical application in certain psychological experiments, work 
by S. Kamocki, not yet published. It may also be useful, using three-dimen- 
sional networks, in physico-chemical problems involving patterns of dis- 
crete particles, for example the pattern of amino acids in the protein molecule 
or the configurations of other molecules or physical structures. 


Problem in Terms of Postulates 


The number of ways of placing p identical counters on a square network 
of m’ positions is defined to be [p/m], where configurations which differ from 
one another by a rotation and/or reflection are considered identical. 


*The author is indebted to Mr. J. Sandiford for his invaluable assistance on the theo- 
retical aspects of the problem and to Mr. J. L. Griffith for his helpful criticism. 
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We may establish a coordinate system on the network, with the axes 
dividing the squares into four quadrants. Taking m = 2n for even networks 
and m = 2n + 1 for odd networks (n = 1, 2, 3, ---), then the range of 


x, y will be 





(-" — | m — 1) 

7 2 ‘ 

Define the operation R as a counterclockwise rotation through 90°. 
Thus R(z, y) = (—y, +z), R* = I. Define the operation S as a reflection in 
the middle vertical line, i.e., the y-axis. Thus S(z, y) = (—z, y), S’ = I. The 


number of systems which are not equivalent under S, FR or any combination 
of S and R is required. 


Properties of the Group {R, 8} 


Denote {R, S}, a group whose elements are R, S and the identity ele- 
ment J, as the group G; which is of order 8. Thus G, = {R, S} = I, R, R’, 
R’; S, SR, SR’, SR*, where by convention the double operation SR repre- 
sents the operation R followed by the operation S. 

Note that R, S are not commutative but that SR = R°S is a reflection 
on the diagonal of positive gradient; SR* = RS is a reflection on the other 
diagonal; and SR’ = R’S is a reflection on the horizontal axis..Hence the 
Group Gs is complete as defined above. G, has three subgroups of order 4, 
viz., the cyclic group 


Gu — {R} = ft R’, 
Guo = {R?, S} — i, S, SR’, Fr. 
G,; = {R’, SR} = I, SR, R’, SR’; 


and the five cyclic subgroups of order 2 denoted by 
G., = {R°}, Goo => {S}, Gs = { SR’}, Ge, = {SR}, ERS = { SR*}. 


Types of Configurations Possible 


Define symmetric, semi-symmetric, anti-symmetric, and asymmetric con- 
figurations in terms of their group properties. A pattern may then be defined 
in terms of the various configurations as follows: if C is a configuration, then 
all the elements of g,C form a single pattern, where gs represents any opera- 
tion in the group G, . The number of p-position patterns in a network of m? 
positions has been defined as [p/m’], which is to be found. Proofs of the theo- 
rems mentioned below are based on elementary group theory; the longer of 
these appear in the Appendix, referred to by (A). 

Symmetric configurations are those invariant under every combination 
of R, S, i.e., under the 8-order group G, . This type of configuration is de- 
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monstrated by Figure 1 and is unique, each one representing also a unique 
pattern. [Gs] will designate the number of patterns of this type in any par- 
ticular case. 

Semi-symmetric configurations (Figures 2, 3, and 4) are those invariant 
under the subgroups of order 4, but not under G, . It is easily shown (A) 
that there are two, and only two, configurations of this type, (each invariant 
under the same subgroup of order 4) based on the same pattern, i.e., a semi- 
symmetric pattern. Thus the number of semi-symmetric patterns is [G,] and 
the number of semi-symmetric configurations is 2[G,]. It will be found con- 
venient to denote the number of patterns in the three subgroups of order 4 
by [Ga], [Gs2], [G43] pertaining to pattern types exemplified by Figures 2, 3, 
and 4, respectively. 
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FIGURE 1 


A Symmetric Gp Pattern 


FIGURE 2 


A Semi-symmetric G),, Pattern 
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FIGURE 3 


A Semi-symmetric Gio Pattern 





‘* PIGURE 4 


aA Semi-symmetric 5 Pattern 
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Anti-symmetric configurations (Figures 5, 6, 7, 8, and 9) are those in- 
variant under the subgroups of order 2, but not under any previous groups. 
There will be four, and only four, such configurations based on the same 
pattern, i.e., an anti-symmetric pattern (A). Thus the number of anti-sym- 
metric patterns is [G,], and the number of anti-symmetric configurations is 
4[G.]. The number of patterns due to the five subgroups of order 2 will be 
denoted by [G21], [G22], [G23], [G24], [Ges] pertaining to the pattern types ex- 
emplified by Figures 5, 6, 7, 8, and 9, respectively. 

Asymmetric configurations (e.g., Figure 10) can be considered those in- 
variant only under J, or, alternatively, those which do not belong to the 
2-, 4-, or 8-order groups of G. Every operation of the group G will yield a 


x 
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AA 


An Asymmetric Gy Pattern 




















different configuration; there are eight configurations for each asymmetric 
pattern. The number of such patterns [G,] for the case [p/m’] will be given by 


("") — 1601 — 214 - 4164) 


oe 
[G,] = 8 : 





Preliminary Theorems 


The following theorems, which follow directly from the postulates above, 
are useful in the solution of the problem. 


THEOREM J. 


ler gs 

m* mj’ 
since for every pattern of p counters on an m’ network there is a corresponding 
pattern of (m? — p) blanks and vice versa. 
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TueoreM II. For every configuration invariant under Go. = {8S} there is 
an equivalent one invariant under G2, = {SR°} and vice versa (A). Figures 
6 and 7 exemplify a pair of such configurations. 

Hence 


[G22] = [G23] = [Goo + G2]. 
From similar considerations, (e.g., Figures 8 and 9) 
[G24] = [G25] oe [Gos + G25]. 


TuHEorEM III. In any odd network the number of patterns for [Gs], [G,], 
and [G,,] will be the same for p = 2r + 1 (counters) as it is for p = 2r (A). 


Method 


it is clear that different considerations apply for odd and even cases, 
both for networks and for number of points forming the pattern. Further, 
symmetrical (G,) patterns of p points over an m’ network are possible only 
if p is a multiple of 4 and, in the case when m is odd, also if p — 1 is a multi- 
ple of 4. Hence solutions of [p/m’] are given for m = 2n, 2n + 1, and p = 4r, 
4r + 1, wherer, n = 0,1, 2,---. 

Now any part of a solution for 4r or 4r + 1 counters which involves 
only multiples of 2r is applicable for all even (p = 2s) or all odd (p = 2s + 1) 
cases respectively, on the same network. This leads to the following theorem. 


TueoreEM IV. If we substitute 2r = s in the solutions of [4r/m?] which in- 
volve only multiples of 2r, then we obtain the solutions of [2s/m?] when s = 1, 3, 
5, --+ . A similar relationship applies between [(4r + 1)/m?] and [(2s + 1)/m’]. 


Proor. It is clear that the relationship is necessary. It is also sufficient 
since if s = 2r in the solutions of [2s/m”] one obtains corresponding solutions 
of [4r/m’]. Hence all the solutions of [2s/m”] are derived according to the 
theorem from those of [4r/m’]. 

Following Theorem IV, the solutions for p = 2s, 2s + 1 (s = 1, 3, 5, --- ) 
are given by replacing 2r by s in the starred (*) set of the relevant solutions 
for p = 4r, 4r + 1, respectively. The starred sets are then the complete solu- 
tions for the cases where p or p — 1 is even but not a multiple of 4. 

The calculation can be simplified by noting that for odd networks 
(m = 2n + 1) the argument with respect to diagonals is identical to that 
with respect to axes. Hence, in such networks, 


[Geo] = [Gis]; [Goo + Gos] = [Gos + Gos). 


In deriving the general solutions, a set of m points invariant under a 
particular group or subgroup of G can be considered as being composed of 
subsets of single points, pairs, quartets, and octets invariant under the same 
group or subgroup. The solution then consists of all combinations of such 
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subsets invariant under the particular group considered. For instance 8-point 
invariant sets occur only in the symmetric (G3) patterns, being quartets of 
pairs of points symmetric about a diagonal or axis. Thus the solution for 
[G.] consists of (all possible combinations of) quartets of diagonal or axial 
points and/or the symmetric octets as described above. 

Numerical evaluations of the general solutions for a number of patterns 
and networks are given in Table 1. The results agree with arithmetical 
enumeration methods which are possible though laborious for small n, r. 


General Solutions 
(i) Solution for [4r/(2n)’]. 


[Ge] = (") (, : x % ‘ i (, - Yai “i 
+ oto (SM Dor (IX <a): 
a - ”) oo ine 

(G+ 1a) = 4(") - tea}, 

ea) = 4("") - tou, 

eat = HC) + (aN 2”) 

(am ts") + 
— 164) - 2104}, 
(2"") — (@.) - a1¢u1, 
>") + ONS =2) 
(2) (y T2) + --- ~~ 2t6ulp 


= {(**") — 16:1 - 2164 - atau}. 


that is, 


2n? 


Gl (Ga) = (2 


[Goo + Gos] vied 3 


——_ 


[Gos + G25] aad 7 


+ 
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(ii) Solution for [* < 1]. 





(2n) * 
[G.]* : Nil. 
[G.]* : Nil. 
[G.]* : [Ga], [G22 + Gea], Nil. 


[Gas + Gas] = 4 7 ")+ (a) - r) 
+ (B32) +} 
ma {(, ) = ste} | 


ee , 4r 
(iii) Solution for laa 


SORES Gn na Cn ae 
2) + 16a) = ("+") — say, 
Galt + alt = (3) + (,, 5)(*) 
+6" Je) + 
(2.1 :10n) = 4(" * ™) — eu) — ara, 


Ons + Gul + (Ou + Gul = Lidia deh 


+(™; + sig Ba + ++: 1) — 3071. 


(G,]* = {(% + 0") ~ [G,] — 21G,] - ste.) 


(iv) Solution for | ee 3 - 





(2n + 1)? 
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: (Gal, as for (iii) —Theorem III. 
[Gao]* + [Gis]*. 

[G.]* : [Ga], 
[Goo + G23] + [Gos + G25] 


ee ‘oe ~ sm + ") 4 g a gies ~ . 
i 1 2r 3 2r—1 


2n + 1 fades mee a 
= ( 5 \e —2 i [Gs] 2(G42]. 


ar = H{(% 4D’) — (ey — 16 — ston. 


Ratio of Symmetric to Total Number of Patterns 


Table 1 gives a selection of numerical evaluations of [Gs], [G,], [G2], [G,], 
and the total number of patterns for networks up to 6° positions. The total 


2 
number of configurations - Jior each case is also listed. It is clear from Table 


1 that the total number of symmetrical patterns [Gs] + [G,] + [G.] is only 
a small ratio of all patterns possible for a particular case, and further that 
this ratio decreases with increasing m, p according to a square root law. 
The latter also follows more rigorously by examining the nature of the 
particular and general solutions outlined above, whence it is seen that 


; ee 
{{G.] + [G,] + [G.]} is, in the main, a function of ) and hence of m? 
2 


2 
and lower powers of m, while js ) is a function of m”’ and lower powers. 


Further it is also easily shown that 


i 2 
(ip) sv(m) < ve(%). 
2p p 2p 
Thus it would appear, both from the trend apparent in Table 1 and from 
the nature of the solution formulas, that {[Gs] + [G.] + [G.]} approximates 


2 
4 (™ ) rather more for cases where the symmetry is more pronounced (as 
Pp 

for odd networks in general) and rather less for the other cases (as for odd 


positions in even networks, where, for example, there is no contribution from 
[Gs], [G.], [Gai], and [Go2 + Gis). 

The small ratio of symmetric to total number of patterns leads to two 
generalizations of practical consequence. First, in any physical problem in- 
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TABLE 1 


Numerical Evaluations for Smallm, p 
















Total number Total number of 
of patterns configurations 


es, 


3 
~~ 
— 
O 
@ 
Raseessasl 
— 
(a) 
» 
ae 
- 
(9) 
N 
iccsiaasd 
(a) 
a 





1 1 ] 0 0 0 1 1 
2 1 0 0 1 0 1 4 
2 2 0 1 l 0 2 6 
3 ] 1 0 2 0 3 9 
3 2 0 2 4 2 8 36 
3 3 0 2 8 6 16 84 
3 4 2 0 11 10 23 126 
4 1 0 0 2 1 > 16 
4 2 0 2 9 10 21 120 
4 3 0 0 14 63 77 560 
4 4 2 5 38 207 252 1,820 
4 5 0 0 42 525 567 4, 368 
4 6 0 6 91 954 1,051 8, 008 
S 7 0 0 70 1, 395 1,465 11,440 
4 8 2 10 112 1,550 1,674 12,870 
5 1 1 0 4 1 6 25 
5 2 0 4 17 28 49 300 
5 3 0 o 57 258 339 2,300 
5 4 4 7 152 1, 503 1, 666 12, 650 
5 5 4 7 328 6,475 6,814 53, 130 
5 6 0 20 645 21,810 22,475 177, 100 
5 7 0 20 1,085 59, 540 60, 645 480, 700 
5 8 7 28 il, 712 134, 333 136, 080 1,081,575 
5 9 7 28 2,372 254, 178 256, 585 2,042,975 
6 1 0 0 3 3 6 36 
6 2 0 2 24 66 93 630 
6 3 0 0 55 865 920 7, 140 
6 4 3 15 264 7, 227 7,509 58,905 
6 5 0 0 468 46,890 47, 358 376, 992 
6 6 0 28 1, 698 242, 618 244, 344 1, 947,792 
6 7 0 0 2, 460 1, 042, 230 1, 044, 690 8, 347, 680 
6 8 6 87 7,062 3, 778, 989 3, 786, 144 30, 260, 340 
6 9 0 0 8, 960 11, 763, 430 11, 772, 390 94 x 10 
6 10 0 108 21, 468 31, 762, 596 31, 784, 172 254 x 10 
6 11 0 0 24,024 75, 088, 650 75,112, 674 601 x 10 





volving the elucidation of the pattern of a structure, it would be clearly 
advantageous to seek any evidence of symmetry within the structure. If 
symmetry is displayed, not only is the number of patterns possible reduced 
significantly, but also these can be listed and checked systematically under 
the groups and subgroups of G, , G, , and G, . Second, for p, m > 2 the 


2 
total number of patterns [p/m’] is of the order of (™) though always 


greater than this number. Hence as a second approximation 
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f 2 
a]-ae) 
| Ps) 8 p (1 + €), 
where ¢ is a small fraction and e— 0 asm, p— o~. 
It may be shown that a good approximation for ¢ is given by 


a/ s(n"): 


| 2 = [G,] + [G2] + [Gs] + [Gs], 


for if 


where [G,], [Gs] are very small compared to [G,] (especially for large m, p), 
then 





“ss 8 


~(s(.+4/ A) 


This formula gives a good approximation for the total number of patterns 
as may be seen from Table 2, but it is, of course, no substitute for the general 
solution when an exact enumeration of the different types of patterns is 
required. 


paw ed =*¥2) 9 


APPENDIX 


Proof that every semi-symmetric pattern has two and only two configurations 
Consider a configuration C invariant under G,, but not under Gs, i.e., 


C = RC = RC = RC. 


Now consider the configuration 


F = SC; 
then 
RF = RSC = SR°C = SC = F, 
R°F = R(RF) = F, 
and 


R°F = R(R’F) = F, 











340 PSYCHOMETRIKA 


TABLE 2 


Error of Approximation Formula for Small m, p 











Percentage 
m Pp € (actual) € (estimated) error in (1 +€) 
3 2 7/9 6/9 -6 
3 3 11/21 9/21 -6 
3 4 6/13 7/13 5 
4 2 2/5 4/11 -3 
4 3 1/10 1/6 6 
= 7 3/28 3/32 -2 
4 5 1/26 2/33 2 
4 6 1/20 1/22 me 
4 7 1/41 1/27 1.3 
4 8 1/25 1/28 - .4 
5 2 1/3 3/13 -8 
5 3 1/9 1/12 -2.5 
5 4 1/19 1/28 -1.6 
5 5 1/38 1/58 = «9 
5 6 1/66 1/105 +5 
5 7 1/108 1/173 ae 
5 8 1/153 1/260 = «2 
5 9 1/212 1/357 ale 
6 z 2/11 4/25 -2 
6 3 1/32 1/21 z 
6 4 1/50 1/61 - oe 
6 5 1/199 1/153 A 
6 6 1/280 1/349 - .08 
6 7 1/848 1/722 - 02 
6 8 1/1050 1/1375 - .02 
6 9 1/2627 1/2425 - 002 
6 10 1/2940 1/4000 - .009 
6 11 1/6250 1/6125 - 0001 





m? 5 . 
*¢ is estimated by ‘/ J ) The actwal value of ¢ is determined from 
p 


p 1 /m? 
e]-i()oes 
m? 8\p 


where [p/m?] is known. Both actual and estimated values are presented in round figures, 
where accuracy is not severely affected. The final column gives percentage error in (1 + e) 
and thus in [p/m] by use of ¢ (estimated). 


therefore 
F = RF = RF = R'F. 


If also F = SC = C, then C = SRC = SR’C = SR’°C, ie., both C and F 
would be identical configurations invariant under G, and hence previously 
included. If however F = SC # C, then F and C are different configurations 
based on the same pattern, and both are invariant under G,, but not under 
G, . Also the invariance of C and F involves all the elements of g,C , there- 
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fore there are no other configurations based on this pattern. Similar proofs 

apply to configurations invariant under Gy. , Gi . 

Proof that every anti-symmetric pattern has four and only four configurations 
Consider a configuration C’, invariant under G,, but not under G; , Ga , 

Gee , Gaz , i.e., C; = RC, . Now consider the configuration C, = RC; ; 


R’C, = FU; = R(R’C,) = RC, = Ce . 


If also C, = RC, = C, , then C, = RC, = RC, = R’C, , i.e., both C, , Cs 
would be identical configurations invariant under G,, and hence previously 
included. 

If, however, C. = RC ¥ C, then C, , C, are different configurations, 
based on the same pattern and both invariant under G., but not under any 
other group of higher order. It may be shown similarly that there exist two 
other configurations, viz. 


C, = SRC, and C, = SC,, 


invariant under G,, but under no other group of higher order. 

Hence C, , C, , C3 , C, are four different configurations, all invariant 
under G,,; and based on the same pattern. Also the invariance of C, , C2 , 
C; , C, involves all the elements of gsC, ; therefore there can be no other con- 
figurations based on this pattern. Similar proofs apply to configurations in- 
variant under Go. , Gos , Gos , Gos - 

Proof of Theorem II. Consider a configuration C, invariant under G2. , 
i.e., SC, = C, ; then there will always exist a different configuration C, = 
RC, ; but SR’C, = SR*C, = RSC, = RC, = C-, , therefore C, is invariant 
under G,; = {SR*}. Consider now a third configuration C; = RC, = R’C, ; 
then SC; = SRC, = R’SC, = R°C, = C; , therefore C, is also invariant 
under G,, . Similarly it may be shown that C, = R’C, is also invariant under 
G.; . These four configurations and their invariants exhaust all the elements 
of gsC, ; therefore the only configurations based on this pattern are two 
invariant under G2. and two under G,; . 

Proof of Theorem III. This theorem follows from the possibility of 
placing the odd counter in the unique central square (0, 0) of the network. 
The symmetry of configurations of 2r counters invariant under the 4- and 
8-order groups and of G,, = {R’} is not affected, and yields a unique con- 
figuration of the same type with 2r + 1 counters. Conversely all patterns 
involving 27 + 1 counters in these groups must have the odd counter at 
(0, 0), the removal of which yields a unique configuration of the same type 
with 2r counters. 


Manuscript received 9/9/58 
Revised manuscript received 1/22/59 
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A SYNTHESIS OF TWO FACTOR ANALYSES OF 
INTERMEDIATE ALGEBRA* 


Wii E. Kiine 


BOARD OF EDUCATION 
OF BALTIMORE COUNTY, MARYLAND 


A battery of 18 tests of intermediate algebra and 20 reference tests was 
administered to two successive second-year algebra classes. Each battery was 
separately factor analyzed by Thurstone methods, and the two analyses were 
synthesized by the Tucker method. The five congruent factors obtained were 
identified as: Verbal Comprehension, Deductive Reasoning, Algebraic Ma- 
nipulative Skill, Number Ability, and Adaptability to a New Task. 


This study aims to explore intensively the small area of intermediate, 
or second-year, algebra by surveying the abilities employed in solving the 
variety of problems included in an intermediate algebra course. The data 
collected for the study provide an excellent opportunity to apply, and thus 
to test, Tucker’s method [14] for synthesis of factor analysis studies. 

The author’s hypotheses before conducting the study included: (i) for 
the solution of algebra problems important basic abilities are inductive and 
deductive reasoning, rote memory, number, verbal comprehension, and 
spatial visualization; (ii) a basic ability of algebraic manipulative skill is 
involved in intermediate algebra; (iii) fluency of expression is required in 
the solution of statement problems; (iv) all abilities necessary for the solu- 
tion of statement problems can be measured by multiple choice tests. 

To test these hypotheses a battery of 38 tests was administered to each 
of two successive second-year algebra classes. Of these tests 18 were algebra 
tests and 20 were reference variables. Separate tables of intercorrelations 
were completed for each of the two years, and each was factor analyzed by 
the Thurstone [11] complete centroid method. The separate factor analyses 
were combined by Tucker’s technique [14] for synthesis of factor analysis 
studies. The loadings of factors congruent to both analyses were determined. 
The axes were rotated in the congruent space to simple structure prior to 
factor interpretation. 

To determine the efficacy of multiple choice answers for statement 
problems, two pairs of similar tests were constructed and included in the 

*This paper is a condensation of a thesis. The work was begun while the author was 
a Psychometrics Fellow of the Educational Testing Service. The work was further supported 
by Conteuet N6onr-270-20 of the Office of Naval Research and by Grant NSF 42 of 


the National Science Foundation. The writer is indebted to Professors Harold Gulliksen 
and Ledyard R Tucker for their guidance throughout this study. 
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analysis. One of each pair used free answers and the other used multiple 
choice answers. The factor content of these pairs of similar tests was com- 
pared to determine whether or not different abilities were measured by the 
two methods of answering statement problems. 


Description of the Variables 


The following is a brief description of the 38 variables used in the study.* 
In the parentheses following the description of each reference variable, 
numbered 19-38, is the name of the factor which the author expected to be 
identified by the test. 


1. Fundamental Operations: algebraic addition, subtraction, multiplication, and division 

of monomials and polynomials. 

2. Fractions: reducing, adding, subtracting, multiplying, and dividing algebraic fractions; 
multiplying mixed expressions; simplifying complex fractions. 

3. Factoring: all of the common types of factoring exercises. 

4. Quadratic Equations: some solvable by factoring, others requiring the quadratic formula 

or method of completing the square. 

5. Radicals: simplifying monomial radicals; fundamental operations of monomial and 

binomial radical expressions. 

6. Exponents: problems based on the four basic laws of exponents; problems based on 

the meaning of the zero exponent, the negative exponent, and the fractional exponent. 

. Binomial Theorem: questions aimed to measure the student’s thorough knowledge of 

the binomial theorem. 

8. Progressions: questions testing the student’s knowledge of arithmetic and geometric 
progressions. 

9. Use of Tables: reading tables of common logarithms, trigonometric functions, and 
logarithms of trigonometric functions. 

10. Principles of Logarithms: questions aimed at measuring the student’s comprehension 
of logarithmic principles, the understanding of the definition of logarithm and its 
implications, and the four basic laws. 

11. Cartesian Graphs: questions on elementary analytic geometry of the straight line and 
the conic sections. 

12. Simultaneous Equations: pairs of equations, linear or quadratic. 

13. Informational Ability in Algebra: aimed at measuring the student’s familiarity with 
new verbal concepts introduced into mathematics at the intermediate algebra level. 

14. Theory of Quadratics: questions on the nature of the roots, the discriminant, the re- 
lationship of the roots to the coefficients, and the parabolic graph of quadratic equations. 

15. Converting Statements to Symbols—A: direct translation of words into algebraic symbols. 

16. Converting Statements to Symbols—B: like Test 15. 

17. Word Problems—B: writing the algebraic expressions for the given verbal phrases or 
the equations that would solve the stated problem. 

18. Word Problems—A: like Test 17. 

19. Addition and Division: addition of three one-digit or two two-digit numbers; division 
of two-digit or three-digit numbers by a number between 2 and 12, inclusive. Prepared 
by Educational Testing Service [3]. (Number) 

20. Subtraction and Multiplication: subtraction of one-digit or two-digit numbers from 


Me of the tests have been deposited as Document number 6006 with the ADI 
Auxiliary Publications Project, Photoduplication Service, Library of Congress, Washington, 
25, D.C. A copy may be secured by citing the Document number and by remitting $15.00 
for photoprints, or $4.75 for 35 mm. microfilm. Advance payment is required. Make checks 
or money orders payable to: Chief, Photoduplication Service, Library of Congress. 
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38. 





WILLIAM E. KLINE 345 


two-digit numbers; multiplication of two-digit or three-digit numbers by one-digit 
or two-digit numbers. Prepared by Educational Testing Service [3]. (Number) 
Opposites: selecting from five words the one which is opposite in meaning to the given 
word. Similar to tests of Garrett [4], Harrell [6], Peterson [8], and Sisk [9]. (Verbal 
Comprehension) 

Similes: patterned after Taylor’s test [10]. The student finished incomplete similes 
with three different words or phrases. (Fluency of Expression) 

Inventive Opposites: patterned after Thurstone’s test [12]. The student responded with 
two words opposite in meaning to the given word. The initial letters of the correct 
responses were given. (Fluency of Expression) 


. Letter-Blank Sentences: sentences created by the subject, with the number of words 


equal to the given number of dashes and letters. Each word replacing a letter was 
required to begin with that letter. (Fluency of Expression) 

Figure Analogies: similar to Army Air Force’s Figure Analogies [1]. (Deductive 
Reasoning) 

Block Counting: the student counted in a pile of blocks, pictured in the test, the number 
of shaded blocks that touched the numbered blocks. Prepared by Educational Testing 
Service [3]. (Space) 

Letter Series: patterned after a Thurstone test [13]. The subject was required to indicate 
the fourth letter after the last letter in the given series. (Inductive Reasoning) 

Letter Sets: similar to tests of the Thurstones [5, 13], Coombs [2], and Taylor [10]. The 
subject selected the one group of four letters that did not possess the characteristic 
common to the other four groups of letters in the item. (Inductive Reasoning) 

Family Trees: similar to the Thurstones’ Pedigrees [13]. The subject answered ques- 
tions concerning relatives of certain members on the given family tree chart. (Inductive 
Reasoning) 

Following Directions: similar to the Thurstones’ test [13], in which the subject per- 
formed simple tasks on the paper with a pencil, following easy directions (Deductive 
Reasoning) 

Similar figures: prepared by Educational Testing Service [3]. The subject selected 
the one of the four given figures which was a reversal. (Space) 

Tabular Completion: similar to Thurstone’s test [12]. The subject filled in missing data 
in tables of numerical data. (Deductive Reasoning) 

Word-Word: similar to Garrett’s tests [4]. A paired-associates test composed of common 
four-letter words. The subject was given four minutes to learn 20 pairs; immediately 
thereafter he was asked to complete each pair, when given one of the two words. (Rote 
Memory) : 

Number-Number: similar to Thurstone’s test [12], like Test 34, but using pairs of two- 
digit numbers. (Rote Memory) 

Final Examination in Algebra: a test of The Choate School. The same test was used 
for the two years of the study. (Grades) 

Second Semester Grade in Algebra: for the student at Choate for the year of the study. 
(Grades) 

Final Grade in English: for the student at Choate for the year of the study. (Grades) 


Two Analyses and Their Synthesis 


The battery of tests was administered to all students in second-year 


algebra in 1950-51 and 1951-52 at The Choate School, an independent boys’ 
preparatory school in Wallingford, Connecticut. About 60 percent of these 
students had studied their first-year algebra at Choate; the others had begun 


their study of algebra in the school they attended prior to attending Choate. 
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The number of subjects included in the study was, by coincidence, 126 for 
each year. The tests were given in 19 testing periods, none longer than 40 
minutes. Although the reference tests were given without advance notice, the 
algebra tests were announced at least five days in advance. 

Except for differences in teaching methods used by the five teachers in 
each of the two school years, the 11 divisions of second-year algebra were 
equally prepared for the algebra tests, in that all students received similar 
daily assignments. In 1950-51 the textbook used was: Welchons, A. M. and 
Krickenberger, W. R., Algebra Book Two; Ginn, 1949. In 1951-52 the text- 
book was: Snader, Daniel W., Algebra (Meaning and Mastery) Book Two; 
Winston, 1950. 

Instructions on the 18 algebra tests were brief, simple, and included the 
time limit, varying from 10 to 30 minutes. The least number of items in any 
test was 24. The author prepared all the items for these tests. The scores on 
the tests were converted into grades, which were a large part of the student’s 
final record in algebra for the year’s course. To achieve optimum performance 
from the students, the author informed them before the testing that final 
grades would depend upon their performance on these tests. 

Instructions on the reference tests were often quite elaborate, usually 
including examples and practice problems. Time limits varied from 4 to 20 
minutes. The brevity of the items enabled more to be administered in a 
shorter time than was true of the items on algebra tests. The number of 
items ranged from 26 to 100. Except for those five tests used with the per- 
mission of Educational Testing Service all the test items were original with 
the author. For motivation the students were informed at the beginning of 
the study that their scores on these tests would be used for guidance purposes. 

Three of the reference variables were students’ grades as reported at 
the end of the year—Final Examination in Algebra, Second Semester Grade 
in Algebra, and Final Grade in English. The data for the first year of the 
study have been labeled Study A; the second year’s data carry the notation 
Study B. 

The scores for the individuals’ tests were intercorrelated separately for 
each year of the study. The two 38 X 38 matrices of intercorrelations are 
given in Table 1. Each of the matrices was factored by the complete cen- 
troid method, using the highest entry in a column as an estimate of the 
communality at each stage of the factoring. Distributions of the final residuals, 
after 12 factors were removed in each study, indicate that the common factor 
variance had been exhausted. The twelfth-factor residuals appear in Table 
A*, and distributions of these final residuals are given in Table B. The pro- 

*Copies of Tables A-M have been deposited as Document number 6007 with the 
ADI Auxiliary Publications Project, Photoduplication Service, Library of Congress, Wash- 
ington 25, D. C. A copy may be secured by citing the Document number and by remitting 


2.50 for photoprints, or $1.75 for 35 mm. microfilm. Advance payment is required. Make 
checks or money orders payable to: Chief, Photoduplication Service, Library of Congress. 
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jections of the 38 test vectors upon the 12 orthogonal centroid reference 
vectors for Study A are listed in Table 2. Those for Study B are listed in 
Table 3. 

Following Tucker’s notation [14] the reference-factor matrices in Table 
2 and 3 are labeled F;,4 and F;4, , respectively. The subscripts, m and M, 
designate the reference factors. In synthesizing the two analyses two trans- 
formation matrices, T,,,4 and 7'y,s , must be found so that only negligible 
differences in the matrices F;,4 and F;,, will result when 


Fi4 = F imal mea and Fy-3 = FyusT ure - 


Only one of the 38 variables was not considered as an overlap test. 
Test 1, Fundamental Operations, was given without earlier preparation in 
Study A; whereas in Study B the students were given assigned work in this 
topic prior to the test. For this reason it was eliminated from the test of 
congruence. 

Table 4 lists the means and the standard deviations of the overlap 

















TABLE 4 TABLE 5 
Means and Standard Deviations 
of the Overlap Variables Latent Roots 
test Mg 55 T53 Study A Study B 
2 20.42 21.00 4.7h u.59 Axis A Axis B 
3 17.41 18.88 3.81 3.60 
4 28.11 27.8 5.78 7h? p-1 14.4720 P-1 12.2873 
5 17.48 20.52 5.10 3.80 p= 2.3614 P-2 2.3003 
6 20.33 20.90 4.68 5.76 p-3 1.1513 P-3 2.0106 
7 17.13 20 Lil 5.22 3.86 p-k 1.5515 P-k 1.2271 
8 9 11.15 3.79 3.90 p-5 1.2286 P-5 0.9263 
9 11.11 8.01 5.21 4.33 p-6 0.7608 P-6 1.2936 
10 Wy.55 13.60 5425 4.98 p-7 0.5706 P-7 0.6672 
11 10.85 13.01 3.16 4.55 p-8 0.6358 P-8 0.7549 
12 16.63 17.56 4.25 4.78 p-9 0.1064 P-9 0.5348 
13 16.78 17.00 6.94 6.53 p-10 0.3663 P-19 0.48L9 
4 11.02 10.67 3.68 3.26 p-ll 0.5427 P-11 0.3932 
15 17.77 19.60 3.96 2.9% p-12 0.2553 P-12 0.2663 
16 16.54 18.00 5.07 3.47 
aa 12.90 1.60 4.22 3.93 
18 9.99 11.29 4.53 4.75 
19 37.37 38.56 7.17 8.57 
20 30.49 31.31 7.2h 8.36 
21 22.22 25.3 7.99 8.14 
22 19.75 22.49 5.62 5.42 : 
23 2h.42 24.88 7.53 7.09 TABLE 6 
2h 37.77 41.69 5.55 4.87 
25 16.77 16.36 4.73 4.61 Coefficients of Congruence 
26 25.82 26.92 4.90 4.93 (arranged in order of size) 
27 15.69 17.3 6.05 6.71 
28 9.06 9.90 3.55 4.10 ¢ r 
29 14.13 15.42 3.2h 3.39 ple 4 a 
30 16.30 5.48 5.46 9860 9930 
31 17.7 18.81 3.77 4.00 9379 29685 
32 17.k2 18.8) . 728 8.26 29151 29566 
33 22.31 25.17 10.53 9.78 «8570 29257 
34 16.33 17.87 7.16 8.15 7300 28544 
35 9.78 9.5 4.86 4.05 23777 +6146 
36 75239 76.91 14.83 13.89 02754 -52k8 
37 73-92 76.2. 9.39 8.33 22120 460k 
38 75 4h 77.09 6.48 7.01 008, .0917 








My = Mean score in Study A 


My = Mean score in Study B 

co A = Standard Deviation of the scores 
in Study A 

co 4B = Standard Deviation of the scores 
in Study B 
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variables for each study. In order that a common unit of measurement 
would be used for each test in both studies, adjustments were made on the 
standard deviations, computed by the formulas 


= OjA _— | 
a 3(o;4 + o;8) am ee 3(o;4 + o;8) 


Reference-factor matrices for the tests with adjusted units of measurement 
were computed by 


Fyma = DysFjma and Foye = DysF jus - 


The subscript J designates the tests after the adjustment of the standard 
deviations. Matrices F;,,4 and F;y, are given in Tables C and D. 

The latent roots and latent vectors of the matrices F4,.4F sn. and 
Fi ueF sme were computed on the IBM 701. The latent roots for both mat- 
rices are shown in Table 5. The latent vectors for Study A are given in mat- 
rix A,,,4 in Table £, and the latent vectors for Study B are given in matrix 
Aywps in Table F. 

Three principal axes in each study (p-9, p-10, p-12, P-10, P-11, and 
P-12), corresponding to the lowest latent roots in Table 5, were discarded 
because it is necessary to exclude from the congruent space those dimensions 
into which the overlap tests have small projections. As yet no rule has been 
formulated to determine which principal axes should be retained and which 
should be relegated to the noncongruent space. In this study the author 
arbitrarily chose 0.5 as the critical value of the latent root. 

The Gramian matrix H, is given in Table G. The latent roots of this 
matrix are the squares of Tucker’s coefficient of congruence, ¢, . 


H, = GG’, 


in which 


Q 
| 


= Thos smal su)l ups ’ 
where 
i = Af" and Tuprs = Aupsbs- 


8, is a diagonal matrix containing the latent roots of F4,,4F sma ,and Bs isa 
diagonal matrix containing the latent roots of F54:F su, . The latent vectors 
of H, are given in matrix A,, in Table H. 

Table 6 gives the values of ¢, for the nine factors remaining in each 
study. As yet no criteria have been developed as to what minimal value of ¢, 
may be acceptable for indicating congruence. Because it is similar to the 
formula for the product moment correlation between the loadings on the 
factor r for Studies A and B, those values in the range acceptable for relia- 
bility of good tests might be considered as being indicative of considerable 








TABLE 9 


Study A 















































TABLE 7 
Factor Loadings of Overlap Tests on the 
Transformation Matrix -- Study A Five Factors in the Congruent Space 
Pura * FJmalmra 
Tog 
Centroid Rotated factors A B c D E 
pers ts 2 10 +23 oh2 -.16 +16 
wee Z : r . a a ae a oe 
= ~5010 0217 +0318 -.0919 20598 4 08 4 025 -.20 +03 
5 35 22h +20 -.18 229 
di +2379 25222 = 20426 0510 25645 6 229 226 209 -.0h ell 
7 29 +22 -.05 -00 oly 
iii = 246 5058 3699 +3226 - 5724 8 50 o2ki -.10 -.08 -.19 
9 51 219 -.13 -.10 +09 
iv 3210 -.1296 -.3033 9286 1934 10 53 o17 -.06 -.0h -.05 
pe +0 20 -.14 12 02 
v 1760 -.0243 21969’ +3012 1063 12 30 213 04 -.17 .08 
13 64 -10 0h 13 -.12 
vi =. 2682 -.3125 1907 0006 2112 us 37 222 -.06 06 04 
15 -67 ~.06 OL 218 -.14 
vii 0903 4910 - +4520 -.0450 -.2h72 16 59 00 06 225 -.04 
17 -61 -.03 -07 02 -.03 
vili 0721 +his32 -.4722 0118 = 4261 18 621 205 -.08 oly -.11 
19 30 -.13 -.18 -.67 -.15 
ix 20946 02347 0438 0028 -.1319 20 218 -.11 ~.22 -.63 -.15 
21 58 -.09 219 20 -.35 
x 20646 -.1297 -.0001 -.0974 1331 22 265 -.14 06 215 -.29 
23 32 -.32 +27 08 09 
xi = -.04,37 068), -.5727 -.1007 20565 2h 42 -.25 +10 03 -.05 
25 238 -.40 18 -.28 222 
xii 0739 -.0546 21213 -.1105 -.0233 26 oS 06 -.21 +13 +29 
27 38 -.14 =.09 lh 3k 
28 +39 -.12 -.02 -.11 02h 
29 = -.15 10 ~—-— a 
30 57 -.30 -.07 01 i 
TABLE 6 5}? 2-102 =. 
32 3 -.0 -.13 . e 
Tamtnentien Selvin ~~ Steady 8 33 re Sa Sa” ae 
by 3 227 -.23 -.01 09 +32 
oe .- & -a aa Cae. oe 
Centroid Rotated factors a4 2 3 = = = 
factors A B c D E 38 3h -07 +20 205 +20 
I +7386 0216 0397 -.0893 20476 
II -.2108 7775 0053 +1937 1150 TABLE 10 Study B 
5 ae ag P me Factor Loadings of Overlap Tests on the 
III 3236 13hh 0575 7316 099 Save Pecteos Su toh Seek tonne 
IV =.0707 +1194 = =. 7326 = = 2608 1122 Fyrp * Fpturs 
Vs -=.2733 -.2472 +32h9 -.1919 8580 A B c = = 
2 17 216 +36 - 019 21 
VI 3767 -.0499 - 4148 «3881 22324 3 15 22 +29 -.21 218 
4 01 3h 34 225 06 
VII -.3836 23112 0219 -.0640 -.2h98 3 225 17 223 -.12 06 
LO 32 +03 -.08 09 
VIII = =.15446 -.1314 -.0555 -.0909 21262 7 227 18 -.02 -.12 18 
8 50 17 -.09 -.06 -.06 
IX  -.2797 1437 +1981 2067 -.2112 9 oh3 +18 -222 -01, 03 
10 54 17 -.10 -.02 03 
X -.1hh9 -.0151 2275 = 047 -.0868 nu 50 ol? -.08 0k -.09 
12 33 20 220 -.12 05 
xI -.0541 -.0011 0873 -.060) 0983 13 60 015 205 oly -.10 
4 oh2 22 -.1, 05 -.11 
yeas 0555 -.0579 -006L, -.0313 1190 15 66 -.15 -.07 18 ell 
16 58 -.05 03 18 -.03 
17 6h 00 13 11 -.03 
18 67 07 09 18 -.10 
19 32 -.15 -.20 -.62 -.18 
20 olf -.12 --11 -.72 -.13 
21 52 -1h 221 12 -35 
22 62 -.18 16 16 -.22 
2 23 229 -229 19 -.03 205 
—— S We oe: eo oe ee 
po 25 038 --37 215 -19 223 
26 3 -02 -.16 13 30 
a d c d e 27 38 -.22 93 222 36 
A 490 512 121 269 264, 28 oh6 =.09 -.09 -.19 026 
29 229 -.20 -97 -.30 -10 
B - 406 581 427 e132 = 585 30 57 -625 08 00 04 
31 5k -.29 -.07 01 -.16 
c -h960— =.53h +779 =236 067 32 37 02 -.2h 13 oS 
23 5h a -.28 a — 
D 22 z -.348  =.852 -006 : -. -.07 20 22 
= sat . F 35 09 -.18 09 -.0L 025 
E' -.522 309 -27h == .258 +764 36 60 +22 -.10 02 05 
37 2 3h -.12 -.03 02 
38 oh2 03 .09 202 -.20 











rons 
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congruence. The five factors having coefficients of congruence above .80 
were retained in the congruent factor space; the remaining four were relegated 
to the noncongruent space. 

Tables 7 and 8 show the matrices T’,,,4 and 7'x,, which transform F';,,4 
and F;y, into two matrices in which the differences are negligible for the 
overlapping tests. 


Para = YmraD, and T7'y,3 = YureD, ’ 
in which 
Yara = Fethve and Yure = (TwpsG’)(A,a)o, 


where D, is a diagonal matrix of the reciprocal of the square root of the 
average of the sums of the squares for entries in corresponding columns of 
Ymra aNd Yurre - 

The matrix, F;,4 = FymaTmra , Shown in Table 9, represents the factor 
loadings of the overlap tests in Study A on the five factors which are in the 
five-dimensional space congruent with Study B. The factor loadings of the 
overlap tests in Study B on the same five factors are shown in matrix F;,, = 
Fyupl urs in Table 10. 

Factor loadings for each pair of factors in the matrices F,,, and F,,z 
were plotted on orthogonal axes. After ten rotations the composite result of 
the transformation matrices, y,, , is given in Table 11. Tables 12 and 13 
show, respectively, T,,,4 and 7’y,., , the transformation matrices for con- 
verting the factor loadings on the rotated oblique factor axes. 








TABLE 12 marae 
Transformation Matrices for Converting 
hoe remap ens abapenng ncn ce ys rit Factor Loadings on Orthogonal Axes to 
Factor Loadings on Rotated Oblique Axes Factor Loadings on Rotated Oblique Axes 
Tis Tsp 
A Bo ae D E A B c D E 
z +2920 3406 «1932 «2507 21835 g «3318 3614 22034 ©2653 1888 
ii --6468 23751 ©3490 --3310 20516 II - 341 3921 3218 -.3831 ~-3600 
iii 2673 =01415 22450 -.4089 -.652h III 4225 ©1989 -. 3882 -.5237 1081 
iv 22195 04276 = 66126 = = 6820 +2510 Iv +5918 o4110 0 =. 4766 e4lhS = =.0457 
v 22269 2OLs3 01038 = =. 3252 1341 Vo =3828 =. 2125 04877 =. 2284 «6400 
vi 0205 = 3455 004720 =. 1643 2446 VI -.0159 oh968 = =04337 = 1498 200 
vii - 22694, os765 -.2151 02357 ~ 4,128 VII -.1957 -.1102 20672 - 20352 -4OLS 
viii -01572 L048 = +3406 222k -25106 VIII =. 1456 -.0961 -.0601 20422 21109 
ix 0435 01176 127k 0121 ~.1796 Ix 0718 -.1963 20609 - 3065 -22605 
x -.0773 -.0785 0083 20520 1367 x 0831 22309 21699 -.0597 -.0691 
xi ~ 21006 03175-04355 +2683 -20h05 xI -.0520 -.0515 21276 -.0201 20572 


xii 21029 -.0789 1315 0919 0351 XII -.0172 20231 0357 20175 119k 








oD 
Ys) 
or) 
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T mes = YmeaD, and Tysp = YucsD, ’ 
where Ymea = TmeaVre O00 Yarn = Tursyee 5 


and where D, is a diagonal matrix of reciprocal square roots of mean sums of 
squares for entries in corresponding columns of yn.4 aNd Yares - 
Matrices F;,, and F,,, , shown in Tables 14 and 15, contain the load- 


ings of the tests on the rotated factors obtained by 
Fya = FymaTmea ODd Fyygp = FyusT uae - 


Tables 16 and 17 show the correlations between the congruent factors. 
Tables J and K include the direction cosines for the noncongruent factors. 
Tables Z and M list factor loadings on these seven factors of Study A and 
B, respectively. These were determined by methods detailed in Tucker’s 
paper [14]. 


Interpretation of the Results 


Identification of the Factors 

The five factors found in the congruent factor space have been labeled: 
Factor A, Verbal Comprehension; Factor B, Deductive Reasoning; Factor C, 
Algebraic Manipulative Skill; Factor D, Number Ability; Factor Z, Adapt- 
ability to a New Task. All tests with factor loadings of .30 or above in either 
study are listed for each of the factors. 


Factor A—Verbal Comprehension 





Test Study A Study B 
no. Algebra Tests Factor Loading Factor Loading 
13 Informational Ability 41 .36 
15 Statements to Symbols—A .50 .46 
16 Statements to Symbols—B 42 .40 
17 Word Problems—B .38 43 
18 Word Problems—A .34 46 

Reference Tests 
21 Opposites .67 64 
22 Vocabulary Completion .62 .63 
23 Similes .40 .33 
24 Inventive Opposites .39 .39 
30 Family Trees .34 .40 
31 Directions 54 45 
38 Final Grade in English .34 .36 


Factor A (Verbal Comprehension) had its highest loading in the two 
reference tests of vocabulary. Loadings above .30 appeared in all the reference 
tests that were largely composed of words except the paired-associates test. 
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In the algebra tests, Factor A was evident only in the four statement problem 
tests and in the test of informational ability. 

Tests 15 and 16 involved the converting of verbal statements into 
algebraic symbols. Since the words were not unusual and were, in the author’s 
opinion, thoroughly familiar to all the students, the large loading of these 
tests on Factor A suggest that this factor is somewhat contaminated with a 
reasoning element. Perhaps the factor could have been named ‘Verbal 
Reasoning.” 


Factor B—Deductive Reasoning 





Test Study A Study B 
no. Algebra Tests Factor Loading Factor Loading 
6 Exponents 27 .38 
7 Binomial Theorem 34 .28 
8 Progressions .35 .36 
9 Use of Tables 45 44 
10 Principles of Logarithms .37 .43 
11 Cartesian Graphs 41 .36 
13 Informational Ability .33 34 
14 Theory of Quadratics .36 .38 
18 Word Problems—A .40 32 
Reference Tests 
26 Figure Analogies 47 .42 
27 Block Counting .33 .19 
32 Similar Figures .30 .47 
33 Tabular Completion 34 .35 
36 Final Examination in Algebra .50 .50 
37 Second Semester Algebra Grade .46 46 


The nine algebra and six reference tests in which Factor B had loadings 
above .30 indicate that it is a factor of Deductive Reasoning. Tests 26 and 
33 are similar to tests which have been used earlier in factor studies to identify 
a factor of deductive reasoning, defined here as the ability to reason from the 
general to the specific—to apply a principle or a rule to a specific problem 
when the principle or rule has been previously given, explained, and under- 
stood by the subject. 


Factor C—Algebraic Manipulative Skill 





Test Study A Study B 
no. Algebra Tests Factor Loadings Factor Loadings 
2 Fractions .63 .57 
3 Factoring .62 54 
4 Quadratic Equations 47 .60 
5 Radicals 51 .40 
6 Exponents .30 .30 
12 Simultaneous Equations .24 .39 
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Factor C (Algebraic Manipulative Skill) appeared only in those algebra 
tests composed of material that had been studied in elementary algebra and 
reviewed in the intermediate algebra course before the tests were given. Its 
highest loadings occurred in those tests which required a maximum of alge- 
braic manipulation. The Final Examination in Algebra was in part composed 
of problems that required manipulation, but these problems constituted 
only 32 percent of the total score. The other variables on which a larger 
loading may have been expected were Test 1, Fundamental Operations, 
which was a very easy test, and variable 37, Second Semester Grade in 
Algebra. The second semester in each year of the study was devoted to the 
topics of logarithms, numerical trigonometry, progressions, and the bi- 
nomial theorem. These topics involve very little algebraic manipulation in 
comparison with the topics taught during the first semester. 

That all the tests with significant loadings on Factor C represent topics 
taught in elementary algebra suggests the possibility that this factor is one 
of previous training. Evidence to the contrary exists in the factor loadings 
of Tests 15-18, which represent topics usually included in elementary algebra. 
Since these tests had very small loadings on Factor C, it is believed the ele- 
ment of previous training is not important in this factor. 

For all the tests having loadings above .30 on Factor C the directions 
were very concise, e.g., “simplify,” “evaluate,” “solve,” “perform the indi- 
cated operations,” etc. There is good evidence that the student was not re- 
quired to change set in the course of the test; it is possible that this factor 
represents ‘‘an ability to maintain a fixed set.’”” The author’s thorough study 
of the other tests leads him to believe that frequent changes of set were 
demanded in each. The possibility exists, too, that the factor may involve 
both the ability to maintain a fixed set and algebraic manipulative skill. 
Further testing and analysis is necessary to determine the identity of this 
factor. Its high correlation of .48 with Factor B, as shown in Table 16, would 
suggest that some element of deductive reasoning may be involved in Factor C. 


Factor D—Number Ability 





Test Study A Study B 
no. Reference Tests Factor Loading Factor Loading 
19 Addition and Division .82 81 
20 Subtraction and Multiplication .80 .80 
25 Letter-Blank Sentences .30 .22 
29 Letter Sets .23 34 
33 Tabular Completion 44 .50 


The very high loadings of Factor D on the reference tests 19 and 20, 
incorporated in the analysis to aid in locating the number factor, provided 
the name for this factor. Test 33 also required a considerable amount of 
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simple arithmetic computation. Test 25, on the other hand, contained no 
numbers, requiring only the writing of a sentence of a given number of words. 
It is difficult to understand why the old concept of the factor of number 
ability played an important role in scoring high on this test; however, it 
might have been that a sense of number aided in composing sentences of the 
required number of words more rapidly. Test 29 contained no numbers and 
required no arithmetic computation. The subject was required to determine 
the characteristic that is common to four of the given five groups of four 
letters each. For this reason Factor D may best be described as a factor of 
number skill contaminated by some ability involving a fluency of ideas. 

The author’s hypothesis that number ability would be found as a factor 
in much of the intermediate algebra is not substantiated by the loadings of 
Factor D in the algebra tests. 


Factor E—Adaptability to a New Task 





Test Study A Study B 
no. Reference Tests Factor Loading Factor Loading 
23 Similes .30 .25 
25 Letter-Blank Sentences .44 .43 
27 Block Counting .37 44 
28 Letter Series .30 .3l 
30 Family Trees .32 . .29 
32 Similar Figures .30 .35 
34 Word-Word .39 .33 


All the tests in the study which involved new tasks for the subject had 
loadings above .30 on Factor EZ. None of the algebra tests had average load- 
ings on Factor E above .14 for the two studies. Similarly, all those reference 
tests which involved tasks that the subjects had previously experienced had 
small loadings on this factor. Factor H, then, seems to represent the ease 
with which the student can learn a new task. It is noteworthy that this 
factor clearly appeared even in connection with such relatively simple new 
tasks as those included in this test battery. To some extent, Factor E is a 
factor of speed, but not the motor speed involved in performing simple 
manual tasks. 


Multiple Choice versus Free Answer 


Test 15, a test of converting statements to algebraic symbols, was con- 
structed with multiple-choice alternatives. Test 16 was constructed with 
problems very similar to those of Test 15, but having no alternatives pro- 
vided. The subject was required to give free answers. Similarly, Test 17 is a 
test of word problems, or statement problems, with multiple choice answers, 
whereas Test 18, with the same kind of problems, requires free answers. 
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TABLE 18 
Comparison of Factor Loadings for Multiple Choice and Free Answer Tests 

Statements to Symbols Word Problems 
Multiple Free Multiple Free 
choice answer choice answer 

Factor Study Test no. 15 Test no. 16 Test no. 17 Test no. 16 

A A 50 h2 38 3k 
Verbal Comprehension B 46 40 43 46 
B A 026 28 02h 20 
Deductive Reasoning B 27 226 026 we 
Cc A -.02 202 212 -.06 
Algebraic Manipulative Skill B -015 200 015 210 
D A 06 -.08 elk 203 
Number Ability B 210 -O1 205 202 
E A 209 210 013 OL, 
Adaptability to a New Task B 015 elk oly 06 
7 17.8 16.5 12.9 10.0 
anes B 19.6 18.0 14.6 11.3 
° 4. 5 
Standard deviations ; ve “07 — ne 





The mean score of each multiple choice test for both Study A and Study 
B is higher than the mean score for the corresponding free answer test; how- 
ever, the standard deviation of each multiple choice test is lower than that 
for the corresponding free answer test. The factor loadings on these four 
tests, Tests 15-18, as shown in Table 18, suggest little difference between 
the abilities needed to solve the multiple choice tests and those needed for 
solving the free answer tests. A study of the correlations of these two pairs 
of tests with the other 34 tests in the study, shown in Table 1, reveals this 
close similarity. The evidence tends to confirm the hypothesis: statement 
problems with multiple choice alternatives can be constructed to measure 
the same abilities as similar statement problems designed for free answers. 


Questions Remaining to be Answered 


This study would suggest the question: what serious consideration can 
be given to the factor loadings in a single factor analysis, by the Thurstone 
multiple-factor technique, if in two studies almost identical in nature only 
five of the twelve factors are congruent? It is true that on three of the twelve 
dimensions the overlap tests had only small projections. It is true, too, that 
if more severe criteria, as recommended by some writers, had been used to 
determine the completeness of the factor extraction, the number of factors 
in each study might have been as few as 6 or 7. Would it be wise in the future 
to conduct all factor analysis studies in two parts, to analyze each part 
separately, and then to test the two parts for congruence? 

It is possible, on the other hand, that Tucker’s technique crams a maxi- 
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mum amount of congruence into the first few factors, and thereby creates 
factors that are psychologically complex and somewhat obscure. This has 
added to the difficulty of identifying the factors with any large degree of 
assurance, and, too, it has added to the resulting high coefficients of correla- 
tion between the factors after rotation. 

If a Study C has been made for a third successive year, similar in nature 
to Studies A and B, and if this analysis had been synthesized with the syn- 
thesis of the first two years, would the number of congruent factors have 
been diminished below five? 

Factors of inductive reasoning, rote memory, spatial visualization, and 
fluency of expression—all expected by the author to appear in an analysis 
of intermediate algebra—failed to appear in the congruent factor space, 
unless they were undetected as a part of the five somewhat complex con- 
gruent factors. 

No attempt was made to rotate the noncongruent factors to a meaning- 
ful interpretation, and therefore no effort was made to analyze or to interpret 
the seven factors of Study A and the seven factors of Study B that did not 
fall in the space congruent to the two studies. The interpretation of these 
factors would seem to have value only in the analysis of the study for each 
year separately and not for the over-all two-year study. 
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REDUNDANCY IN TASK ASSIGNMENTS AND 
GROUP PERFORMANCE* 


Rosert B. ZAJoNcC AND WILLIAM H. SMOKE 
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The problem of combining abilities of group members to maximize the 
performance of the group as a whole is examined in terms of redundancy in 
task assignments. In particular, ways of distributing a given number of items 
of information among a given number of individuals to obtain the maximum 
probability of each item being recalled by at least one individual are studied. 
It is shown that there exists an optimal distribution scheme which is inde- 
pendent of the amount of material originally given, the size of the group, and 
individual differences in ability. 


The problem of assessing determinants of group performance has been 
investigated from two different points of view. For the most part, studies 
in this area have been concerned with the effects of group variables, such as 
the presence of others [1, 2], cohesiveness [e.g., 8], leadership style [e.g., 5], 
and the like, on the performance of groups and of individuals working in 
groups. Recently, some attempts have been made to analyze the group 
product by means of a combinatorial analysis of individual abilities. Lorge 
and Solomon [4] performed such an analysis in the area of group problem 
solving, and Hays and Bush [3] have used it in group learning. In principle, 
this latter approach is analogous to that of Moore and Shannon in what 
they called the “crummy relay problem” [6], which refers to constructing 
reliable circuits out of unreliable relays. von Neumann [9] has shown that 
by using a number of components of limited unreliability a reliable machine 
may be constructed. Moore and Shannon have demonstrated that a reliable 
circuit may be designed by using arbitrarily unreliable relays. The increase 
in circuit reliability is obtained essentially by increasing the redundancy 
among relays. It would seem that the study of group performance in terms 
of redundancy among the abilities of individual members would hold con- 
siderable promise. : 

Consider, for example, a group of N individuals. Let H items of informa- 
tion be given to these individuals. The object is to recover this information 
from the group as a whole after some interval of time. For the present purposes 
it is irrelevant which individual remembers a particular item, although the 
item should be remembered by somebody in the group. There is evidence in 


*This work was done under the sponsorship of the Behavioral Sciences Division, 
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the area of individual recall to the effect that the proportion of items recalled 
is inversely related to the number of items originally assigned [10]. Conse- 
quently, the probability that a given item is remembered by a given indi- 
vidual is some inverse function of the number of items he was asked to learn. 
On the other hand, the probability that at least one individual of those 
assigned the given item remembers it increases with the number of indi- 
viduals assigned the item. The first consideration implies minimizing the 
number of items per individual, the second maximizing it. The problem then 
is to discover the optimal distribution of items among individuals. The group 
as a whole is considered to remember an item when that item is remembered 
by at least one individual. 


Case I 


The following conditions are imposed on Case I. 

(a) The probability p(z, 7) that item 7 is remembered by individual 7 is 
equal to the constant p, 0 < p < 1, or to zero according to whether or not 
item 7 is assigned to 7. 

(b) Each individual is assigned the same number, h, of items. Thus 
dus p(t, 7) = hp. 

(c) Each item is assigned to an equal number, n, of individuals. Thus the 
probability, P, that a given item is recalled by at least one individual is 


given by 
(1) Pee SEE i eh A og, 


the same for all items. 

Under the conditions of Case I the problem is reduced to finding the 
assignment of items which generates the greatest value of P, i.e., finding 
values of p and n that will maximize P. If it is assumed that one is dealing 
not with number of items (a discrete measure) but with amount of material 
or amount of information (continuous measures), P may be regarded as a 
differentiable function of p. The necessary condition for P to be maximum 
as a function of p is that dP/dp = 0. Since h is a differentiable function of p, 
and since by (b) and (c) Nh = Hn, it follows that n is also a differentiable 
function of p. Thus, from (1) 


dP n| dn le ss ee ] 
@) Po = —1- py 2 og -) - 2s] 
Given p # 1, dP/dp = 0 only if the bracketed expression is zero, or 
dn n 
(3) e+e F 5 =* 


Since n = Nh/H, 
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(4) ow og (1 — p) - — = 0. 


dp p 
Thus far the function relating p to h has not been specified. However, 
the condition in (4) holds for any set of values p and h which satisfy the 
relation p = 1 — e*, for an arbitrary constant k. Thus 


P = 1 abe [1 ane (1 ae ” i | id ie. 1 ae, y= 


is a constant since all the terms in the expression are constants. Thus, if the 
probability of a given item being recalled by a given individual were given 
by p = 1 — e*, all assignments would be equally good. Under these condi- 
tions what is lost in p by assigning more items to each individual is gained 
by increasing 7, and consequently the probability that an item is remembered 
by the group as a whole is independent of the assignment of items to the 
group members. 

If, however, the relation between p and h is not given by p = 1 — e*” 
but by some other function p = f(h), then in general not all assignments will 
be equally good. In fact, on the basis of empirical data available in this area 
[10] it would appear that the function is of the form p = e~*"’, for an em- 
pirical parameter k which depends on such factors as time, nature of the 
material, its organization, meaningfulness, or the like. This function fits data 
gathered by Oberly [7] with k = .10. Assuming p = e™**”’ and finding dh/dp, 


(5) (1 — p) log (1 — p) — 2p logp = 0, 


which is satisfied approximately for p = .84. Figure 1 shows the relation 
between P and 7 for selected values of k with N/H = .01. 

Solving for h,h = Vlog .84/—k’ = .42/k. Thus, the best assignment 
results when each individual is assigned .42/k items. In terms of p, the maxi- 
mum value of P obtains when each individual is given the number of items 
which would result in his forgetting about 16% of the material. Since the 
relationship between h, which denotes individual loads, and n, which reflects 
the amount of task-assignment redundancy, is known, the optimal amount 
of redundancy may be obtained. Thus, for Case I, n = (.42/k)(N/H), the 
optimal amount of task-assignment redundancy. 

It is rather interesting to note that the result obtained is entirely inde- 
pendent of the size of the group, N, and the number of items, H. Thus for 
any given number of individuals and any given number of items, .42/k items 
per person represents the optimal assignment. Of course, the optimal amount 
of redundancy and consequently the maximum value of P vary with the 
ratio of individuals to items. The larger this ratio the higher the maximum 
possible value of P. Figure 2 represents the relationship between the maxi- 
mum values of P and the N/H ratio for some selected values of the parame- 


ter k. 
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Figure 1 
The Relationship Between P and p for Different Values of the Constant k. 


The above solution was obtained by assuming h to be a continuous 
variable. Given a group of N individuals and a collection of H discrete items, 
it will not be possible in general to assign the items to individuals in more 
than a small number of ways. Hence the assignments do not vary continu- 
ously. As a matter of fact, for given values of N and H only some of all the 
possible assignments satisfy the conditions (b) and (c). It can be demon- 
strated that if D is the greatest common divisor of N and H, the number of 
assignments of H items to N group members satisfying (b) and (c) is equal 
to D, given that two assignments are not considered distinct when they 
assign the same number of items to each individual. 


Case II 


In Case II items of equal difficulties and individuals with equal recall 
capacities were considered. Now the case where there exist individual dif- 
ferences in recall will be examined. 

Again a set of conditions is imposed. 
(a) The probability, p(i, 7), that item 7 is remembered by individual J, is 
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equal to the constant p, 0 < p < 1, or to zero according to whether or not 
item 7 is assigned to j. 

(b) The number, h; , of items assigned to individual 7 is such that p= 
e"*i*""| where k; is an empirical parameter obtained with respect to the 
individual j. 

(c) The items are so distributed that for each item 7 the probability that 1 
is remembered by at least one individual is equal to the constant P’ = 1 — 
IL; (1 — p(é, 7], the same for all items. 

Thus, the above conditions imply that different individuals will be 
assigned different numbers of items, depending on their individual abilities to 
remember them. The conditions also imply that each item 7 is assigned to the 
same number of individuals n, or that the redundancy is equal for all items. 
Hence 


(6) P'=1- JJ [1 — pf, )]) =1- (1 — p”. 


Again, the necessary condition that P’ be a maximum as a function of p is 
that dP’/dp = 0. Thus 


O Pa = ~0 ~ py] Hoga -» - 2]. 


If p ¥ 1, then 


(8) (1 — p) log (1 — 9) — n= 0. 


In terms of the above conditions the number of assignments is equal to nH 
and to >.;h; . Hence nH = )°;h; . Thus 


dn 1 dh; 
dp = H x dp 


Il 





From restriction (b), 


dp _ _on2, Ghi hth? oy? dh; _ 1 dh; 
rs 1 = —2kjh; = i" 2k jh, = Dp? (2p log p) i, ae 
Thus 


, ah; 
h; = (2p log p) =? 
(2p log p) dp 


and 


(2p log P) 2 


Lh = 
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Therefore 
#iy&_ 1 (l)y, 28 
bt ae Si H ah §o 
and 
dn n 
G+ Peet “9 *s 1-H eet -P Sie 
or 
(9) (1 — p) log (1 — p) — 2p logp = 0. 


Note that individual differences do not influence the solution, as (9) is satis- 
fied for p = .84. However, the number of items, h; , to be assigned to the 
different individuals will depend on their recall abilities which are reflected 
in the constants k; . If the assignment of items to individuals satisfies (a) 
and (b), then for two individuals j, and j. 


exp (—kj,hj,) = p = exp (—Ki,hi,) 





or 
k;,hi, k;,h;, 
Thus 
k;, 
h;, aes k;, h;, 
and in general 
k;, 
hj, i k; hi, 


for each member j, of the group. Since >| ,h; = nH, 
nl = Eh, = hihi, Dd: 
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and in general 
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for each individual j. In this case individual differences are exploited by 
assigning fewer items to less able members and more items to the capable 
individuals. 

While the value p = .84 is optimal under the restrictions specified 
above, it remains to be determined whether a different solution is obtained 
by relaxing restriction (a) such that p(z, j), is no longer required to be a 
constant across individuals. 


Conclusions 


The solutions presented provide a standard against which empirical 
results may be compared. Empirical tests must conform to the predictions, 
otherwise the restrictions imposed on the solutions could not have bec: met. 
Thus by careful experimental controls it can be discovered what vania)les 
determine the departures from the prediction. 

For instance, the conditions imposed above require that p(z, 7), the 
probability of the individual j recalling the item 7, be constant. This, of 
course, necessitates a complete independence of the recall probabilities in 
terms of the items, as well as in terms of individuals. The probability p(z, , j,) 
must be independent of p(z, , j.) and of p(z, , j,). Therefore, the cases ex- 
amined are valid not for groups of interacting members but for collections 
of individuals working independently of one another. 

This requirement, however, is not at all a shortcoming. On the cou- 
trary, it allows one to study the effects of group interaction on indiviciual 
and group performance in recall. ‘Te ex!stence of group interaction */ould 
probably lower the value of the parameter |’, and this effect can be evuluated 
readily by empirical tests. The difference betveen the values of th= parameter 
k for individuals working together and for inaividuals werking alone would 
provide information on the effect of group interaction on individual per- 
formance, and the difference between the corresponding values of P, in- 
formation concerning the effect of group interaction on group performance. 

It is noted that the model presented is not restricted to recall, and that 
it may, with slight modifications, be applied to other behaviors, such as 
learning, problem solving, or decision making. 
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Mood’s likelihood ratio test is generally considered an unreliable x? 
approximation in 2 X 2 contingency tables containing expected cell frequen- 
cies less than five. Probability values were ay Sg for 60 such tables as part 
of an item analysis for two 30-item alternate forms of a measure. The rank 
orders of the items, from best to worst differentiators, as determined sepa- 
rately by Mood’s test and by Fisher’s exact test correlated .97 for one form 
and .96 for the other. 


Item analyses are often carried out not to determine the correlations 
of dichotomously scored items with a total test score, but rather to select a 
given number of items which best differentiates between high and low scorers. 
This type of analysis is facilitated by the use of 2 X 2 contingency tables 
which compare for each item the score (one or zero) of extreme scorers on 
the total test. The writer recently administered to groups of 25 and 22 sub- 
jects two 30-item alternate forms of a measure which he wanted to condense 
into a single 30-item form. Because of the restricted sampling, many of the 
expected cell frequencies in the resulting contingency tables were less than 
five. Since x’ approximations are generally not reliable with small cell fre- 
quencies [1], the probability values for the tables were computed directly 
by Fisher’s exact test, and the 30 best items were identified accordingly. 
The probability values covered a wide range, from .008 to 1.00. 

Probability values for the tables were also estimated by Mood’s likeli- 
hood ratio test ({2], pp. 257-281), which is distributed approximately as x” 
with (r — 1)(s — 1) degrees of freedom for large samples. Mood states, 
however, that this large-sample approximation cannot be used without 
appreciable error when r and s both equal two. To evaluate this possibility 
with the 2 X 2 tables involved here, a comparison was made between the 
rank order of the items (from best to worst) for each of the two forms as 
determined by Mood’s test and as indicated by the exact test. For one form 
the rank-order correlation (Spearman’s rho) achieved between these two 
methods was .97 and for the other it was .96. Thus, although assumptions 
for its use were not met, Mood’s test gave a very good indication of the 
relative probability values for the items. This may indicate that where ranking 


*Now at University of Rochester School of Medicine, 
371 








372 PSYCHOMETRIKA 


is the goal, as in the item analysis described, the x” approximation by Mood’s 
likelihood ratio test is an adequate statistical tool even with small samples. 
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BOOK REVIEWS 


JANE LoEvINGER. Objective Tests as Instruments of Psychological Theory. Psychological 
Reports, Monograph Supplement 9, 1957, 636-694. 


A brief journal review cannot possibly do justice to a 60-page monograph which 
is so tightly packed with highly significant ideas and original points of view. The mission 
of the monograph is to present the point of view that the time has come to dispose of 
classical concepts of validity and to replace them with a concept in keeping with modern 
science. In particular, the proposal is that such concepts as predictive, concurrent, and 
content validities be swept aside as mere handmaidens of a psychotechnology of dubious 
scientific status. The clean sweep would also include such off-brand varieties of validity 
as factorial validity. In their place, it is proposed, the only concept of validity which be 
retained is that of construct validity, a concept which is developed more fully in this 
monograph than it has been developed elsewhere. It is argued that predictive, concurrent, 
and content validity are ad hoc concepts which have little to do with what the scientist 
does or what he needs to know. 

The concepts of predictive and concurrent validity represent a concept of science 
which was typical of the late nineteenth century and which was presented by Karl Pearson 
in his famous Grammar of Science. A great amount of psychometric work today would 
represent an approach to the discovery of knowledge compatible with that presented by 
Pearson, and yet, in the hands of philosophers and logicians Pearson’s approach has not 
fared well. In a sense, the introduction of the term construct validity by the APA Com- 
mittee on Psychological Tests represents a major break with the classical correlational 
concepts of validity and an indication that psychometrics is recognizing the fundamental 
changes which have taken place in our conception of science since the days of Pearson. 
The central purpose of the monograph is to clarify the concept of construct validity and 
to begin to establish it as an idea of central importance in the development of a science 
of behavior. 

Jane Loevinger is attempting to initiate a revolution in thinking in the area of psycho- 
metrics so that the instruments that evolve will be instruments for scientific advance 
rather than gadgets in a technology. Objective tests, she believes, are to play a central 
role in the development of psychological theory. Tests must be based on a theory of test 
behavior which is to be related to a theory of behavior in nontest situations. The elements 
in a psychological theory, it appears, are variables, but it is not clear whether these are 
or are not to be defined by objective tests. 

The monograph proposes that the concept of validity be considered to include two 
components, One of these is the substantive component which is ‘“‘the extent to which 
the content of the items included in (and excluded from?) the test can be accounted for 
in terms of the trait believed to be measured and the context of measurement. Context 
includes psychological theory and, in particular, ‘the psychology of objective test be- 
havior.’ ’’ The writer suspects that what Loevinger means by psychological theory may 
be different from what he means by psychological theory, but the concept of psychological 
theory involved in the document under review is not developed to the point where such 
a comparison is possible. Within the field of psychometrics, test items are often discussed 
as if their properties could be described in the same terms as other stimuli, such as a pure 
tone or a flash of light or an electrical discharge. The experimental psychologist is likely 
to consider the properties of test items as stimuli as the consequence of antecedent learn- 
ing conditions and that the properties of the items must be measured in terms of those 
conditions. Thus a semantic count provides a rough measure of typical exposure to par- 
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ticular words used in particular contexts, and this measure in turn can be used to predict 
some of the responses that will occur to the particular item when it is presented as a part 
of a test. Such a measure of semantic frequency is one dimension along which the substan- 
tive component of a test item can be fixed. Presumably there would be other components 
too. But how would the substantive characteristics of items be determined in the case of 
items from personality tests? How can one tie down the antecedent conditions which 
generate the particular stimulus values of the items? In the Loevinger monograph such 
problems are not elaborated and the reviewer is left with the uneasy feeling that Loevinger 
is concerned with intuitively derived psychological theory rather than that in which the 
constructs are hypothetical and include unobservables. This is no real criticism of the 
monograph but only points up the long and difficult road that lies ahead of the person 
who embarks on a program of integrating psychometrics with psychological theory. 

The second major component of validity proposed in the monograph is the structural 
component, which is a much better defined concept than is the substantive component. 
The structural component refers to “the extent to which structural relations between 
test items parallel the structural relations of other manifestations of the trait being meas- 
ured.”’ Interpreted in terms of the language of systematic psychology, one might say that 
structural validity implies that relations between responses within test situations should 
parallel in some way the relations discovered between responses in nontest situations. 
The structural component refers to the extent to which a test is a manifestation of a law 
relating responses to responses. Such laws, commonly described as R—R laws, have been 
generally considered to be rather low-level laws; that is to say, the uses which they have 
are limited, and the boundary conditions for them can rarely be specified. Herein lies one 
of the major differences between psychometrics and experimental psychology. In psycho- 
metrics, R-R laws are sought, while the experimental psychologist seeks S-R laws because 
of the particular advantages which they possess. A thoroughgoing integration of psycho- 
metrics and current psychological theory would require that psychometrics become a 
means of developing S-R laws, but perhaps it is much too early to hope for this develop- 
ment. Loevinger has made a beginning by identifying some of the problems which this 
integration involves. 

The monograph provides an overview of various structural models that have been 
used and the problems which they involve. Under this heading the author has included 
quantitative models, class models, and dynamic models. The material could be dry to 
read, but it is not. Novel ways of looking at the problems involved add a continuous source 
of freshness to the discussion. Indeed, the incidental comments alone make this mono- 
graph worth reading, but it contains much more than incidental material. 

Loevinger writes with a contagious enthusiasm for her subject. Whether the reader 
agrees or disagrees with the arguments presented, he will come away from the monograph 
with the feeling that he has lived through a refreshing and worthwhile experience. 


University of Utah Rosert M. W. TRAVERS 


L’ Analyse Factorielle et ses Applications. Paris: Centre National de la Recherche Scien- 
tifique, 1955. Pp. 431. 


For five days in July, 1955, 37 factor analysts and scholars interested in factor 
analysis from 12 countries, met in Paris to discuss papers on factor theory and its appli- 
cations. This book is the result of their meeting. 

The conference was conceived by Henri Laugier, well known as a physiologist and 
as a scientist with a strong interest in applied psychology. It was supported, in part, by 
a grant from the Rockefeller Foundation. Listed home countries for those actually present 
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were: France, 15; Great Britain, 5; Sweden, 3; Egypt, Germany, Israel, Switzerland, and 
the United States, 2 each; and Belgium, Canada, Spain, and Yugoslavia, 1 each. 

Among those who had been invited but who were not actually able to be present 
were two who were represented by papers: Sir Cyril Burt and L. L. Thurstone. Their 
messages of regret are on early pages of the report. In addition to his formal paper, Thurs- 
tone contributed a resumé of the Conference on Factor Studies held at Educational Test- 
ing Service, Princeton, New Jersey, in November, 1951. 

Papers were circulated in advance, and the meetings, with the help of translators, 
concentrated on discussion. Of 22 papers, 9 were originally in French, 13 in English. In 
the book all papers and summaries of the discussions are presented in French. The follow- 
ing list of papers is a good guide to the nature of its content. 

“Current problems and new methods in factor analysis,’”’ L. L. Thurstone. 

“The Uppsala Symposium on psychological factor analysis, Uppsala, March 17-19, 
1953,” E. A. Peel. 

“Dimensions of intellect,” J. P. Guilford. 

“Factor analysis: methods and results,” Sir Cyril Burt. 

“Le probléme général de la recherche et de la nature des facteurs en psycho-phys- 
iologie,’’ (The general problem of research on factors and of their nature in psycho-phys- 
iology), H. Pieron. 

“Relations of the newer multivariate statistical methods to factor analysis,’’ H. 
Hotelling. 

“Remarques sur Il’analyse factorielle de Hotelling et comparison avec les méthodes 
centroides,’’ (Remarks on Hotelling’s factor analysis and a comparison with centroid 
methods), H. Pineau. 

“A statistical test for the stability of simple structure,” R. Bargmann. 

‘Psychological meaning of factor analysis as a research method,” M. Yela, 

“Les facteurs psychologiques: quelques remarques sur leur nombre, leur identifica- 
tion, leur nature,’’ (Psychological factors: remarks on their number, identification, and 
nature), Miss G. Bernyer. 

“Facteurs observés et facteurs théoriques en psychologie,’”’ (Observed and theo- 
retical factors in psychology), M. Reuchlin.. 

“The radex approach to factor analysis,’ L. Guttman. 

“Utilisation du schéma de Spearman, dans le calcul des images,’’ (The use of a 
Spearman formulation in the calculation of images), J. M. Faverge. 

“Factor analysis and the problem of validity,’”’ H. J, Eysenck. 

“Emploi de l’analyse factorielle dans l’étude de la variabilité biologique,’”’ (The use 
of factor analysis in the study of biological variability), E. Schreider. 

“Application de l’analyse factorielle 4 l’étude de la mortalité,” (Application of factor 
analysis in the study of mortality), S. Ledermann. 

“Observations theoriques sur l’analyse factorielle linéaire et générale,’’ (Theoretical 
remarks on general, linear factor analysis), G. Darmois. 

‘“‘Nouvelle méthode de statistique mathématique pour I’estimation des facteurs et 
de leur écart-type en analyse factorielle,’”’ (A new method in mathematical statistics for 
the estimation of factors and their standard deviations in factor analysis), P. Delaporte. 

Discussion of a proposal of R. B. Cattell, “A universal index for psychological factors,” 

‘“‘Trends of research in space abilities,’ A. H. El Koussy. 

“Factor analysis of achievement tests: methodological considerations and some 
empirical findings,’ T. Husen and S, Henrysson. 

“The factor analysis of person correlations and the use of independent determiners 
to identify the factors,’ E. A. Peel. 

“Rapport de synthése,” (Summary of the conference), M. Reuchlin. 

These papers and related discussions depict remarkably clearly the contemporary 








378 PSYCHOMETRIKA 


state of the science and art of factor analysis. Contributors to the colloquium represented 
a broad spectrum of views, both on theoretical issues and on practical applications of 
factor methods. 

There were differences in historical orientation. Thurstone’s paper traced the origins 
of factor analysis to the contributions of Spearman, while Burt’s paper ascribed the first 
version of the two-factor theorem to Galton and the method of principal axes to Pearson. 
Burt stated that the fundamental formula used by himself and his associates prior to 
1925 was the “centroid equation,” ultimately adopted by Thurstone as the basis of the 
centroid method. 

Controversies centering around the two-factor approach of Spearman as contrasted 
with the multifactor approach of Thurstone were not in evidence. However, the theo- 
retical position of Burt and his followers seemed to occupy middle ground as far as em- 
pirical findings were concerned. A general factor was still important in mental test material 
but there was generous provision for group factors and special abilities. 

Considerable attention was given to mathematical issues, especially on questions 
relating to sampling. Hotelling pointed out that many of the original objectives of factor 
analysis (but not all) could be better reached by other multivariate methods. He treated 
factor analysis as a means of estimating the dimensionality of a population, pointing out 
that the greatest use of present factor procedures was to suggest hypotheses susceptible 
of being proved more objectively by other methods. Darmois, mathematical statistician 
at the Sorbonne, pointed out a number of the probability considerations arising in factor 
analytic practices, while Delaporte presented a method of estimating the standard error 
of a factor loading. In a somewhat parallel development, Bargmann’s discussion of simple 
structure is of considerable interest. 

Several of the papers, notably those of Thurstone, Guilford, Burt, and Hotelling, 
presented points of view that would have been quite familiar to readers of Psychometrika 
in 1955. Thurstone was pleased that multiple factor analysis had been judged sufficiently 
useful to justify an international conference, and indicated some of its eventual scientific 
possibilities. Guilford summarized his findings of the preceding six years, using the general 
framework of the centroid method and orthogonal rotations to simple structure, and seek- 
ing factors of psychological significance in high aptitude personnel, especially in the areas 
of reasoning and creative intelligence. 

The volume is an important source of information on Guttman’s radex approach 
to factor analysis. In 1955, Guttman’s contributions to the classification of patterns of 
matrices of correlations were not yet well known. His views of the simplex and circumplex 
were elaborated not only in his paper but also in numerous discussions, 

To this reviewer, one of the most surprising (and pleasing) papers was that of Pieron, 
who is well known to American psychologists, but hardly as a factor analyst. Yet such is 
Pieron’s versatility that, on the basis of reading 150 papers, he presented a thoughtful 
resumé not only of factor theory but also of applications in areas of special interest to 
investigators of problems in physiological psychology. 

The future historian of factor analysis will find this book a source of considerable 
information on trends and unsolved problems in the mid-fifties. The contemporary factor 
analyst will find some material not readily available elsewhere, as well as a number of 
stimulating papers and discussions. 

Pare H. DuBors 


Washington University 
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Minutes of the 
1959 ANNUAL BUSINESS MEETING 
of the 
PSYCHOMETRIC SOCIETY 


The regular Annual Meeting of the Psychometric Society was held in Cincinnati, Ohio, 
on Tuesday, September 8, 1959. President Frederic M. Lord called the meeting to order at 
2:00 P.M. 


The minutes of the previous Annual Meeting were approved. 


On a ballot for the election of two new members of the Council of Directors, Dr. Jane 
Loevinger and Dr. John E, Milholland were elected for a term of three years, ending in 1962. 


Dr. Lloyd G. Humphreys reported for the Membership Committee. The Membership 
Committee nominated 45 persons as full members, 60 student members to be transferred to 
full membership, and 32 individuals as student members. 


It was moved, seconded, and passed that the following 45 persons be elected as full 
members: 


Frank B, Baker Sam Mayo 

Donald B. Black Jum C, Nunnally 

Lyle E. Bourne Ellis B. Page 

Allen C. Busch George J. Palmer, Jr. 
Don J. Cosgrove Mohammed Y. Quereshi 
John W. Cotton J. A. Radcliffe 
William F. Dossett Frank Restle 

S. David Farr Lillian Capo de Rivero 
David J. Fitch W. S. Robinson 
Raymond W. Frankman Nathan Rosenberg 

Paul A. Games Richard A, Schindler 
Clifford P. Hahn Charles F, Schumacher 
Dean Harper Elmer Louis Streuning 
Paul Heit Theresa G. Trittipoe 
Miguelina N. Hernandez Read D, Tuddenham 
Edwin P. Hollander David L. Wallace 

John G. Hurst Wimburn Wallace 
Thomas Owen Jacobs Sam C. Webb 

John E. S. de Jung Frederic D. Weinfeld 
William J. Laidlaw Warren W,. Willingham 
Herbert C. Lansdell Francis A. Young 
Russell S. MacArthur Joseph Zeidner 


R. Daniel Malone 


It was moved, seconded, and passed that the 60 student members listed below be 
transferred to full membership: 


Nancy S. Anderson Thomas H. Burkhalter 
Norman Anderson Ralph McC, Chinn 

David C. Beardslee Paul Raymond Christensen 
Morton A, Bertin Norman Cliff 

Nicholas A. Bond, Jr. Bernard P, Cohen 

Eugene A. Bouvier W. C. Coppock 

James G. Boyce Mary Corcoran 
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Lolafaye Coyne 
Elliott M. Cramer 
Kern W. Dickman 
Sidney Epstein 
Donald Estavan 
Gordon Fifer 

William J. Flynn, Jr. 
Roman Gawkoski 
Clifton W. Gray 
Isaiah Guttman 

Roy Hastman 

Robert Arthur Hehner 
Shinkuro Iwahara 
James L. Jacobson 
Thomas H. Jerdie 
Henry F. Kaiser 
Herbert D. Kimmel 
Rupert A. Klaus 
Henry E. Klugh 
Richard H. Lindemen 
Clifford E. Lunneborg 
Richard D. Mann 
Robert E. McClintock 


It was moved, 


student members. 


G. Ernest Anderson, Jr. 
Joan A. Barr 
Alan R. Bass 


Carl E. Bereiter 
Robert Besco 

Vincent N. Campbell 
Ronald Lynn Flaugher 
Jane Garrett 

Bert A. Goldman 
Marshall Greenberg 
David L. Grese 
Richard Earl Hilligoss 


Stephen M. Hunka 

Edward S. Johnson 
Francis B. Kapper 
Eric Klinger 


It was moved, seconded, 


John F. Muldoon 
Warren T. Norman 
David B. Orr 

Lee Edward Paul 
William Prokasy, Jr. 
James C. Reed 
Charles V. Riche, Jr. 
William L. Robinson, Jr. 
Robert Sadacca 

K, Warner Schaie 
Morris Schnore 
Richard E, Schutz 
Robert W. Scollay 
Ahmed El Sensussi 
Roy G. Simpson 
Richard E. Stafford 
Robert Earl Stake 

E. Elizabeth Stewart 
J. G. Thomas 
Edward E. Ware 
Richard W. Willard 
Calvin E. Wright 
Joseph L, Zinnes 


seconded, and passed that the 32 persons named below be elected as 


George Walter Mayeske 
James J. McKeon 
Arthur Lee Miller 
Curtis R. Miller 
Jason Millman 
Richard Millward 
Alexander T. Quenka 
Donald Clare Ross 
Malcolm J. Slakter 
Timothy A. Smith 
Saul H. Sternberg 
Jack T. Tapp. 
Mehmet F. Turgut 
Bob J. Williams 
Paul L. Williams 
Morris Wolfe 


and passed that the Membership Committee be thanked for 


their excellent work 


Dr. Wilfred A. Gibson reported for the Program Committee. Of twelve abstracts of 
papers submitted for consideration for presentation at the Annual Meeting, ten were accepted. 
One symposium was scheduled. The Committee cooperated with the Division 5 Program Com- 
mittee in finding a speaker for the joint dinner and in scheduling the various technical and bus- 
iness sections. It was moved and seconded that the report be accepted with thanks. Motion 
passed. 

Dr. Harold Gulliksen reported for the Committee on the 25th Anniversary of the 
Psychometric Society. He reported general agreement upon the importance of an appropriate 
celebration, It was moved and seconded that the present Committee for the 25th Anniversary 
be discharged with thanks and that the incoming president appoint a committee of three to 
Motion passed, 


implement an appropriate plan for the occasion, 
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It was moved and seconded that the Society appropriate a sum of up to $500.00 to be ex- 
.pended in connection with the 25th Anniversary celebration of the Society. Motion passed. 


It was moved and seconded that, in connection with the annual billings for dues, the 
Treasurer invite voluntary contributions from the membership for executing plans in connec- 
tion with the 25th Anniversary celebration, with $2.00 suggested as a typical contribution. 
Motion passed. 


Dr. Charles F. Wrigley pointed out that 1960 is not only the 25th Anniversary of the 
Psychometric Society, but also the 100th anniversary of Fechner’s book and that it might be 
‘appropriate for the Psychometric Society to pay attention to this important anniversary. It was 
moved and seconded that this fact be brought to the attention of the committee arranging plans 
for the 25th Anniversary, together with the possibility of collaborating with the American 
Psychological Association in the matter of the Fechner centennial. Motion passed. 


The report of the Treasurer was presented by Dr. William B. Schrader. A copy is 
attached. The report was accepted with thanks. 


Dr. Ledyard R Tucker reported for the Auditing Committee. The Treasurer’s books, 
bills received, cancelled checks, and bank statements for the fiscal year July 1, 1957 to June 
31, 1958 were inspected. All financial matters of the Society were found to be in good form 
and in accord with the report published in the December 1958 issue of Psychometrika. The 
report was accepted with thanks. 


On ballot, Dr. Philip H. DuBois was re-elected Secretary of the Psychometric Society 
for a term of three years, ending September 30, 1962. 


Dr. Wrigley reported for the Committee on Special Memberships. The Committee sug- 
gested that a category of ''Foreign Affiliates" be established. Foreign affiliates would receive 
Psychometrika, would have the privilege of participating in the meetings of the Society, and 
would pay the same dues as student members; however, no endorsers would be required on 
applications and foreign affiliates would not have the right of holding office. Persons residing 
abroad who wish to become regular members could continue to do so, as at the present time. 
It was moved and seconded that steps be taken toestablish this category of membership, Motion 
passed. 


The Committee on Special Memberships also suggested that there be established a cat- 
egory of "Emeritus Members" for individuals who have reached the age of 65 and who have 
paid full dues for 15 years. Such emeritus members would receive Psychometrika and would 
have full membership privileges, including the right to vote and to hold office. Suggested dues 
would be $1.00 a year. It was moved and seconded that steps be taken to establish such a cat- 
egory. Motion passed. 


The Secretary’s report was presented by Dr. DuBois. It was accepted with thanks . 
Reporting for the Election Committee, Dr. Clyde H. Coombs stated that Dr. Humphreys 
has been elected President of the Psychometric Society for a period of one year, beginning; 


October 1, 1959. 


The meeting was adjourned at 5:00 P.M. 


Philip H. DuBois 
Secretary 
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1956 4 - 
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Psychometric Corporation (90% 
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Psychometric Corporation (Publications) 227.56 
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Secretarial Services 
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RECEIPTS 
Subscriptions (less agency discounts) $6,799.80 
Psychometric Society (90% of dues) 4,254.30 
Sale of Back Issues (less discounts) 2,490.71 
Sale of Monographs 5-8 (less discounts) 145.30 
Interest on Savings Accounts 275.63 
Reprints 762.15 
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Mailing List Use 0 
$15,735.20 
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Printing and Mailing Psychometrika 

Volume 23, No. 2, through 24, No. 1 $7,696.40 
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Stipend of Managing Editor 

(7/1/58 - 6/30/59) 750.00 
Stipend of Assistant Editor 
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Stationery and Postage 218.88 
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Legal Fees 35.00 
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Balance, June 30, 1958 $7,242.74 
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Metropolitan Savings and Loan Assn. 

Los Angeles, California 3,500.00 
Total a pe 2.7 
Receipts, 1958-59 BE 2755.20 

Sum 25,977.9' 
Disbursements 11,192.38 
Remainder st 7s 36 
Balance, June 30, 1959 $10,785.56 
, , ? 
Reserve Funds, June 30, 1959 
Englewood Savings and Loan Assn. 
Englewood, Colorado 3,500.00 
Metropolitan Savings and Loan Assn. 
Los Angeles, California 3,500.00 
Total, Balance and Reserve Funds $17,785.56 
OBLIGATIONS 
Estimated cost of Psychometrika, 
Vol. 24, Nos. 2-% 

Printing and Mailing $6,000.00 
Stipends (7/1/59 - 12/31/59) 750.00 
Secretarial Services 0.00 

$7, 300.00 
BALANCE AND RESERVES, LESS OBLIGATIONS $10,485.56 





403 








ERRATUM 


In Cureton, Edward E., Note on ¢/¢max . Psychometrika, 1959, 24, 89-92, 
the first sentence of paragraph 2, page 89, should read “It is well known that 
¢@ can equal +1 only if p, = p. , and that it can equal —1 only if p, = ps 
({1], p. 324; [2], p. 342).” 
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